Skip to content

Commit 24a85a3

Browse files
committed
sync from upstream. Still need to fix the streaming.
1 parent 0a1c750 commit 24a85a3

File tree

9 files changed

+2390
-38
lines changed

9 files changed

+2390
-38
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Extensions
22

3+
*package-lock.json
4+
*package.json
5+
*node_modules
6+
*.ipynb
7+
.*.json
8+
39
*.a
410
*.bat
511
*.bin

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1453,6 +1453,8 @@ endif # GGML_RPC
14531453
llama-server: \
14541454
examples/server/server.cpp \
14551455
examples/server/utils.hpp \
1456+
examples/server/function-call-parser.hpp \
1457+
examples/server/function-call.hpp \
14561458
examples/server/httplib.h \
14571459
examples/server/colorthemes.css.hpp \
14581460
examples/server/style.css.hpp \

README.md

Lines changed: 117 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,124 @@
1-
# llama.cpp
1+
# tools.cpp
2+
tools.cpp is Rubra's fork of llama.cpp, offering inference of Rubra's function calling models (and others) in pure C/C++.
23

3-
![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png)
4+
## tools.cpp quickstart
5+
1. build from source:
6+
7+
- Mac user
8+
```
9+
make
10+
```
11+
12+
- Nvidia-Cuda user:
13+
```
14+
make LLAMA_CUDA=1
15+
```
16+
17+
2. Install a helper package that fixes some rare edgecases:
18+
```
19+
npm install jsonrepair
20+
```
421

5-
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
6-
[![Server](https://github.com/ggerganov/llama.cpp/actions/workflows/server.yml/badge.svg)](https://github.com/ggerganov/llama.cpp/actions/workflows/server.yml)
7-
[![Conan Center](https://shields.io/conan/v/llama-cpp)](https://conan.io/center/llama-cpp)
22+
3. Download a compatible Rubra GGUF model:
23+
For example:
24+
```
25+
wget https://huggingface.co/rubra-ai/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/rubra-meta-llama-3-8b-instruct.Q6_K.gguf
26+
```
27+
28+
For large multi-part model files, such as [rubra-meta-llama-3-70b-instruct_Q6_K-0000*-of-00003.gguf](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct-GGUF/tree/main), use the following command to merge them before proceeding to the next step:
29+
```
30+
./llama-gguf-split --merge rubra-meta-llama-3-70b-instruct_Q6_K-0000*-of-00003.gguf rubra-meta-llama-3-70b-instruct_Q6_K.gguf
31+
```
32+
This will merge multi-part model files to one gguf file `rubra-meta-llama-3-70b-instruct_Q6_K.gguf`.
33+
34+
4. start openai compatible server:
35+
```
36+
./llama-server -ngl 37 -m rubra-meta-llama-3-8b-instruct.Q6_K.gguf --port 1234 --host 0.0.0.0 -c 8000 --chat-template llama3
37+
```
838

9-
[Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggerganov/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggerganov/llama.cpp/discussions/205) / [ggml](https://github.com/ggerganov/ggml)
39+
5. Test the server, ensure it is available:
40+
```bash
41+
curl localhost:1234/v1/chat/completions \
42+
-H "Content-Type: application/json" \
43+
-H "Authorization: Bearer tokenabc-123" \
44+
-d '{
45+
"model": "rubra-model",
46+
"messages": [
47+
{
48+
"role": "system",
49+
"content": "You are a helpful assistant."
50+
},
51+
{
52+
"role": "user",
53+
"content": "hello"
54+
}
55+
]
56+
}'
57+
```
58+
59+
6. Try a python function calling example:
60+
```python
61+
# if openai not installed, do `pip install openai`
62+
from openai import OpenAI
63+
client = OpenAI(api_key="123", base_url = "http://localhost:1234/v1/")
64+
65+
tools = [
66+
{
67+
"type": "function",
68+
"function": {
69+
"name": "get_current_weather",
70+
"description": "Get the current weather in a given location",
71+
"parameters": {
72+
"type": "object",
73+
"properties": {
74+
"location": {
75+
"type": "string",
76+
"description": "The city and state, e.g. San Francisco, CA",
77+
},
78+
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
79+
},
80+
"required": ["location"],
81+
},
82+
}
83+
}
84+
]
85+
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
86+
completion = client.chat.completions.create(
87+
model="rubra-model",
88+
messages=messages,
89+
tools=tools,
90+
tool_choice="auto"
91+
)
92+
93+
print(completion)
94+
```
95+
96+
The output should look like this:
97+
```
98+
ChatCompletion(id='chatcmpl-EmHd8kai4DVwBUOyim054GmfcyUbjiLf', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='e885974b', function=Function(arguments='{"location":"Boston"}', name='get_current_weather'), type='function')]))], created=1719528056, model='rubra-model', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=29, prompt_tokens=241, total_tokens=270))
99+
```
100+
101+
That's it! MAKE SURE you turn `stream` OFF when making api calls to the server, as the streaming feature is not supported yet. And we will support streaming too soon.
102+
103+
For more function calling examples, you can checkout `test_llamacpp.ipynb` notebook.
104+
105+
### Choosing a Chat Template for Different Models
106+
107+
| Model | Chat Template |
108+
|---------|:-------------:|
109+
| Llama3 | llama3 |
110+
| Mistral | llama2 |
111+
| Phi3 | phi3 |
112+
| Gemma | gemma |
113+
| Qwen2 | chatml |
114+
115+
For example, to run [Rubra's enhanced Phi3 model](https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct-function-calling-alpha-v1-GGUF), use the following command:
116+
117+
```bash
118+
./llama-server -ngl 37 -m phi-3-mini-128k-instruct-function-calling-alpha-v1.Q8_0.gguf --port 1234 --host 0.0.0.0 -c 32000 --chat-template phi3
119+
```
10120

11-
Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) in pure C/C++
121+
==============================================================================
12122

13123
## Recent API changes
14124

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
#include <iostream>
2+
#include <fstream>
3+
#include "json.hpp"
4+
#include <regex>
5+
#include <memory>
6+
7+
using json = nlohmann::ordered_json;
8+
9+
std::string generate_uuid() {
10+
static std::random_device rd;
11+
static std::mt19937 generator(rd());
12+
static std::uniform_int_distribution<int> distribution(0, 15);
13+
14+
const char *v = "0123456789abcdef";
15+
std::stringstream uuid;
16+
17+
for (int i = 0; i < 8; ++i) {
18+
uuid << v[distribution(generator)];
19+
}
20+
return uuid.str();
21+
}
22+
23+
24+
std::string jsonrepair(const std::string value) {
25+
std::array<char, 128> buffer;
26+
std::string result;
27+
// Ensure the command passed to popen() is null-terminated
28+
std::string tmpfile_name = "." + generate_uuid() + ".json";
29+
std::ofstream outfile(tmpfile_name);
30+
outfile << value; // Assuming jsonStr contains your JSON string
31+
outfile.close();
32+
std::string command = "node jsonrepair.ts " + tmpfile_name;
33+
std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(command.c_str(), "r"), pclose);
34+
if (!pipe) {
35+
throw std::runtime_error("popen() failed!");
36+
}
37+
while (fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr) {
38+
result += buffer.data();
39+
}
40+
return result;
41+
}
42+
43+
44+
json parse_if_json(const std::string& value) {
45+
try {
46+
// json repair here
47+
return json::parse(jsonrepair(value));
48+
} catch (const json::parse_error&) {
49+
return value; // Return the original string if parsing fails
50+
}
51+
}
52+
53+
54+
std::string clean_command_string(const std::string& command_str) {
55+
std::string cleaned_command = std::regex_replace(command_str, std::regex(R"(\\(?!["\\/bfnrt]|u[a-fA-F0-9]{4}))"), "");
56+
cleaned_command = std::regex_replace(cleaned_command, std::regex(R"(\\")"), "\"");
57+
58+
if (cleaned_command.front() == '"' && cleaned_command.back() == '"') {
59+
cleaned_command = cleaned_command.substr(1, cleaned_command.size() - 2);
60+
}
61+
return cleaned_command;
62+
}
63+
64+
65+
json clean_json_strings(const std::string& input_str) {
66+
try {
67+
// json repair here
68+
std::string fixed_str = jsonrepair(input_str);
69+
json data = json::parse(fixed_str);
70+
for (auto& [key, value] : data.items()) {
71+
if (value.is_string()) {
72+
std::string val = value.get<std::string>();
73+
if (val.front() == '{' || val.front() == '[') {
74+
data[key] = parse_if_json(val);
75+
} else {
76+
data[key] = clean_command_string(val);
77+
}
78+
} else if (value.is_object()) {
79+
for (auto& [k, v] : value.items()) {
80+
if (v.is_string()) {
81+
v = clean_command_string(v.get<std::string>());
82+
}
83+
84+
}
85+
}
86+
}
87+
return data;
88+
} catch (const json::parse_error& e) {
89+
std::cerr << "Error decoding JSON: " << e.what() << std::endl;
90+
return nullptr;
91+
}
92+
}
93+
94+
95+
96+
97+
std::vector<json> rubra_fc_json_tool_extractor(const std::string& output_str) {
98+
std::vector<json> result;
99+
std::cout << "Output to Parse : " << output_str.c_str() << std::endl;
100+
if (output_str.find("endtoolcall") == std::string::npos) {
101+
return result;
102+
}
103+
104+
std::vector<std::string> listOfStrToParse;
105+
size_t start = 0, end = 0;
106+
107+
// Iterate until all instances of "endtoolcall" are processed
108+
while ((end = output_str.find("endtoolcall", start)) != std::string::npos) {
109+
std::string segment = output_str.substr(start, end - start);
110+
size_t pos = segment.find("starttoolcall");
111+
if (pos != std::string::npos) {
112+
// Extract substring after "toolcall"
113+
std::string ss = segment.substr(pos + std::string("starttoolcall").length());
114+
listOfStrToParse.push_back(ss);
115+
}
116+
start = end + std::string("endtoolcall").length(); // Move past the "endtoolcall"
117+
}
118+
119+
std::vector<json> function_call_json;
120+
121+
try {
122+
for (const auto & line : listOfStrToParse) {
123+
// json fc = json::parse(line);
124+
125+
json fc = clean_json_strings(line);
126+
if (!fc["arguments"].is_string()) {
127+
fc["arguments"] = fc["arguments"].dump();
128+
}
129+
if (!fc.is_null()) {
130+
function_call_json.push_back(fc);
131+
}
132+
133+
}
134+
} catch (const std::exception& e) {
135+
std::cerr << "Error: " << e.what() << std::endl;
136+
}
137+
138+
for (const auto& fc : function_call_json) {
139+
json func_call;
140+
func_call["id"] = generate_uuid();
141+
func_call["name"] = fc["name"];
142+
func_call["kwargs"] = fc["arguments"];
143+
func_call["type"] = "function";
144+
result.push_back(func_call);
145+
}
146+
147+
return result;
148+
}

0 commit comments

Comments
 (0)