Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 58c071c

Browse files
vansangpfievsangjanaiohaiibuzzle
authored
chore: sync main to dev (#1978)
* feat: AMD hardware API (#1797) * feat: add amd gpu windows * chore: remove unused code * feat: get amd gpus * fix: clean * chore: cleanup * fix: set activate * fix: build windows * feat: linux * fix: add patches * fix: map cuda gpus * fix: build * chore: docs * fix: build * chore: clean up * fix: build * fix: build * chore: pack vulkan windows * chore: vulkan linux --------- Co-authored-by: vansangpfiev <[email protected]> * fix: add cpu usage (#1868) Co-authored-by: vansangpfiev <[email protected]> * fix: PATCH method for Thread and Messages management (#1923) Co-authored-by: vansangpfiev <[email protected]> * fix: ignore compute_cap if not present (#1866) * fix: ignore compute_cap if not present * fix: correct gpu info * fix: remove check for toolkit version --------- Co-authored-by: vansangpfiev <[email protected]> * fix: models.cc: symlinked model deletion shouldn't remove original file (#1918) Co-authored-by: vansangpfiev <[email protected]> * fix: correct gpu info list (#1944) * fix: correct gpu info list * chore: cleanup --------- Co-authored-by: vansangpfiev <[email protected]> * fix: gpu: filter out llvmpipe * fix: add vendor in gpu info (#1952) Co-authored-by: vansangpfiev <[email protected]> * fix: correct get server name method (#1953) Co-authored-by: vansangpfiev <[email protected]> * fix: map nvidia and vulkan uuid (#1954) Co-authored-by: vansangpfiev <[email protected]> * fix: permission issue for default drogon uploads folder (#1870) Co-authored-by: vansangpfiev <[email protected]> * chore: change timeout * fix: make get hardware info function thread-safe (#1956) Co-authored-by: vansangpfiev <[email protected]> * fix: cache data for gpu information (#1959) * fix: wrap vulkan gpu function * fix: init * fix: cpu usage * fix: build windows * fix: buld macos --------- Co-authored-by: vansangpfiev <[email protected]> * fix: handle path with space (#1963) * fix: unload engine before updating (#1970) Co-authored-by: sangjanai <[email protected]> * fix: auto-reload model for remote engine (#1971) Co-authored-by: sangjanai <[email protected]> * fix: use updated configuration for remote model when reload (#1972) Co-authored-by: sangjanai <[email protected]> * fix: correct engine interface order (#1974) Co-authored-by: sangjanai <[email protected]> * fix: improve error handling for remote engine (#1975) Co-authored-by: sangjanai <[email protected]> * fix: temporarily remove model setting recommendation (#1977) Co-authored-by: sangjanai <[email protected]> --------- Co-authored-by: vansangpfiev <[email protected]> Co-authored-by: OHaiiBuzzle <[email protected]>
1 parent bb6d60b commit 58c071c

22 files changed

+352
-220
lines changed

docs/docs/architecture/cortex-db.mdx

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,14 @@ import TabItem from "@theme/TabItem";
1515
This document outlines Cortex database architecture which is designed to store and manage models, engines,
1616
files and more.
1717

18-
## Tables Structure
19-
18+
## Table Structure
2019
### schema Table
21-
2220
The `schema` table is designed to hold schema version for cortex database. Below is the structure of the table:
2321

2422
| Column Name | Data Type | Description |
2523
|--------------------|-----------|---------------------------------------------------------|
26-
| version | INTEGER | A unique schema version for database. |
24+
| schema_version | INTEGER | A unique schema version for database. |
25+
2726

2827
### models Table
2928
The `models` table is designed to hold metadata about various AI models. Below is the structure of the table:
@@ -53,7 +52,6 @@ The `hardware` table is designed to hold metadata about hardware information. Be
5352
| activated | INTEGER | A boolean value (0 or 1) indicating whether the hardware is activated or not. |
5453
| priority | INTEGER | An integer value representing the priority associated with the hardware. |
5554

56-
5755
### engines Table
5856
The `engines` table is designed to hold metadata about the different engines available for useage with Cortex.
5957
Below is the structure of the table:

engine/CMakeLists.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@ if(CMAKE_BUILD_INJA_TEST)
7373
add_subdirectory(examples/inja)
7474
endif()
7575

76-
7776
find_package(jsoncpp CONFIG REQUIRED)
7877
find_package(Drogon CONFIG REQUIRED)
7978
find_package(yaml-cpp CONFIG REQUIRED)

engine/cli/commands/server_start_cmd.cc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,16 +66,16 @@ bool ServerStartCmd::Exec(const std::string& host, int port,
6666
si.cb = sizeof(si);
6767
ZeroMemory(&pi, sizeof(pi));
6868
std::wstring params = L"--start-server";
69-
params += L" --config_file_path " +
70-
file_manager_utils::GetConfigurationPath().wstring();
71-
params += L" --data_folder_path " +
72-
file_manager_utils::GetCortexDataPath().wstring();
69+
params += L" --config_file_path \"" +
70+
file_manager_utils::GetConfigurationPath().wstring() + L"\"";
71+
params += L" --data_folder_path \"" +
72+
file_manager_utils::GetCortexDataPath().wstring() + L"\"";
7373
params += L" --loglevel " + cortex::wc::Utf8ToWstring(log_level_);
7474
std::wstring exe_w = cortex::wc::Utf8ToWstring(exe);
7575
std::wstring current_path_w =
7676
file_manager_utils::GetExecutableFolderContainerPath().wstring();
77-
std::wstring wcmds = current_path_w + L"/" + exe_w + L" " + params;
78-
CTL_DBG("wcmds: " << wcmds);
77+
std::wstring wcmds = current_path_w + L"\\" + exe_w + L" " + params;
78+
CTL_INF("wcmds: " << wcmds);
7979
std::vector<wchar_t> mutable_cmds(wcmds.begin(), wcmds.end());
8080
mutable_cmds.push_back(L'\0');
8181
// Create child process

engine/common/hardware_common.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ struct GPU {
7979
int64_t total_vram;
8080
std::string uuid;
8181
bool is_activated = true;
82+
std::string vendor;
8283
};
8384

8485
inline Json::Value ToJson(const std::vector<GPU>& gpus) {
@@ -100,7 +101,10 @@ inline Json::Value ToJson(const std::vector<GPU>& gpus) {
100101
gpu["total_vram"] = gpus[i].total_vram;
101102
gpu["uuid"] = gpus[i].uuid;
102103
gpu["activated"] = gpus[i].is_activated;
103-
res.append(gpu);
104+
gpu["vendor"] = gpus[i].vendor;
105+
if (gpus[i].total_vram > 0) {
106+
res.append(gpu);
107+
}
104108
}
105109
return res;
106110
}

engine/controllers/engines.cc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,17 +375,21 @@ void Engines::UpdateEngine(
375375
metadata = (*exist_engine).metadata;
376376
}
377377

378+
(void)engine_service_->UnloadEngine(engine);
379+
378380
auto upd_res =
379381
engine_service_->UpsertEngine(engine, type, api_key, url, version,
380382
"all-platforms", status, metadata);
381383
if (upd_res.has_error()) {
382384
Json::Value res;
383385
res["message"] = upd_res.error();
386+
CTL_WRN("Error: " << upd_res.error());
384387
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
385388
resp->setStatusCode(k400BadRequest);
386389
callback(resp);
387390
} else {
388391
Json::Value res;
392+
CTL_INF("Remote Engine update successfully!");
389393
res["message"] = "Remote Engine update successfully!";
390394
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
391395
resp->setStatusCode(k200OK);
@@ -394,6 +398,7 @@ void Engines::UpdateEngine(
394398
} else {
395399
Json::Value res;
396400
res["message"] = "Request body is empty!";
401+
CTL_WRN("Error: Request body is empty!");
397402
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
398403
resp->setStatusCode(k400BadRequest);
399404
callback(resp);

engine/controllers/models.cc

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -218,10 +218,11 @@ void Models::ListModel(
218218
obj["id"] = model_entry.model;
219219
obj["model"] = model_entry.model;
220220
obj["status"] = "downloaded";
221-
auto es = model_service_->GetEstimation(model_entry.model);
222-
if (es.has_value() && !!es.value()) {
223-
obj["recommendation"] = hardware::ToJson(*(es.value()));
224-
}
221+
// TODO(sang) Temporarily remove this estimation
222+
// auto es = model_service_->GetEstimation(model_entry.model);
223+
// if (es.has_value() && !!es.value()) {
224+
// obj["recommendation"] = hardware::ToJson(*(es.value()));
225+
// }
225226
data.append(std::move(obj));
226227
yaml_handler.Reset();
227228
} else if (model_config.engine == kPythonEngine) {

engine/cortex-common/EngineI.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,14 +59,14 @@ class EngineI {
5959
const std::string& log_path) = 0;
6060
virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0;
6161

62+
// Stop inflight chat completion in stream mode
63+
virtual void StopInferencing(const std::string& model_id) = 0;
64+
6265
virtual Json::Value GetRemoteModels() = 0;
6366
virtual void HandleRouteRequest(
6467
std::shared_ptr<Json::Value> json_body,
6568
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
6669
virtual void HandleInference(
6770
std::shared_ptr<Json::Value> json_body,
6871
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
69-
70-
// Stop inflight chat completion in stream mode
71-
virtual void StopInferencing(const std::string& model_id) = 0;
7272
};

engine/extensions/remote-engine/remote_engine.cc

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,13 @@ size_t StreamWriteCallback(char* ptr, size_t size, size_t nmemb,
2929
CTL_DBG(chunk);
3030
Json::Value check_error;
3131
Json::Reader reader;
32-
if (reader.parse(chunk, check_error)) {
32+
context->chunks += chunk;
33+
if (reader.parse(context->chunks, check_error) ||
34+
(reader.parse(chunk, check_error) &&
35+
chunk.find("error") != std::string::npos)) {
36+
CTL_WRN(context->chunks);
3337
CTL_WRN(chunk);
38+
CTL_INF("Request: " << context->last_request);
3439
Json::Value status;
3540
status["is_done"] = true;
3641
status["has_error"] = true;
@@ -143,7 +148,9 @@ CurlResponse RemoteEngine::MakeStreamingChatCompletionRequest(
143148
"",
144149
config.model,
145150
renderer_,
146-
stream_template};
151+
stream_template,
152+
true,
153+
body};
147154

148155
curl_easy_setopt(curl, CURLOPT_URL, full_url.c_str());
149156
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);

engine/extensions/remote-engine/remote_engine.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ struct StreamContext {
2525
extensions::TemplateRenderer& renderer;
2626
std::string stream_template;
2727
bool need_stop = true;
28+
std::string last_request;
29+
std::string chunks;
2830
};
2931
struct CurlResponse {
3032
std::string body;

engine/services/engine_service.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -870,10 +870,10 @@ cpp::result<void, std::string> EngineService::UnloadEngine(
870870
auto unload_opts = EngineI::EngineUnloadOption{};
871871
e->Unload(unload_opts);
872872
delete e;
873-
engines_.erase(ne);
874873
} else {
875874
delete std::get<RemoteEngineI*>(engines_[ne].engine);
876875
}
876+
engines_.erase(ne);
877877

878878
CTL_DBG("Engine unloaded: " + ne);
879879
return {};

engine/services/hardware_service.cc

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ bool TryConnectToServer(const std::string& host, int port) {
3838

3939
HardwareInfo HardwareService::GetHardwareInfo() {
4040
// append active state
41+
std::lock_guard<std::mutex> l(mtx_);
4142
auto gpus = cortex::hw::GetGPUInfo();
4243
auto res = db_service_->LoadHardwareList();
4344
if (res.has_value()) {
@@ -63,7 +64,8 @@ bool HardwareService::Restart(const std::string& host, int port) {
6364
namespace luh = logging_utils_helper;
6465
if (!ahc_)
6566
return true;
66-
auto exe = commands::GetCortexServerBinary();
67+
auto exe = file_manager_utils::Subtract(
68+
file_manager_utils::GetExecutablePath(), cortex_utils::GetCurrentPath());
6769
auto get_config_file_path = []() -> std::string {
6870
if (file_manager_utils::cortex_config_file_path.empty()) {
6971
return file_manager_utils::GetConfigurationPath().string();
@@ -144,16 +146,17 @@ bool HardwareService::Restart(const std::string& host, int port) {
144146
ZeroMemory(&pi, sizeof(pi));
145147
// TODO (sang) write a common function for this and server_start_cmd
146148
std::wstring params = L"--ignore_cout";
147-
params += L" --config_file_path " +
148-
file_manager_utils::GetConfigurationPath().wstring();
149-
params += L" --data_folder_path " +
150-
file_manager_utils::GetCortexDataPath().wstring();
149+
params += L" --config_file_path \"" +
150+
file_manager_utils::GetConfigurationPath().wstring() + L"\"";
151+
params += L" --data_folder_path \"" +
152+
file_manager_utils::GetCortexDataPath().wstring() + L"\"";
151153
params += L" --loglevel " +
152154
cortex::wc::Utf8ToWstring(luh::LogLevelStr(luh::global_log_level));
153-
std::wstring exe_w = cortex::wc::Utf8ToWstring(exe);
155+
std::wstring exe_w = exe.wstring();
154156
std::wstring current_path_w =
155157
file_manager_utils::GetExecutableFolderContainerPath().wstring();
156-
std::wstring wcmds = current_path_w + L"/" + exe_w + L" " + params;
158+
std::wstring wcmds = current_path_w + L"\\" + exe_w + L" " + params;
159+
CTL_DBG("wcmds: " << wcmds);
157160
std::vector<wchar_t> mutable_cmds(wcmds.begin(), wcmds.end());
158161
mutable_cmds.push_back(L'\0');
159162
// Create child process
@@ -185,7 +188,7 @@ bool HardwareService::Restart(const std::string& host, int port) {
185188
auto dylib_path_mng = std::make_shared<cortex::DylibPathManager>();
186189
auto db_srv = std::make_shared<DatabaseService>();
187190
EngineService(download_srv, dylib_path_mng, db_srv).RegisterEngineLibPath();
188-
std::string p = cortex_utils::GetCurrentPath() + "/" + exe;
191+
std::string p = cortex_utils::GetCurrentPath() / exe;
189192
commands.push_back(p);
190193
commands.push_back("--ignore_cout");
191194
commands.push_back("--config_file_path");
@@ -486,7 +489,7 @@ std::vector<int> HardwareService::GetCudaConfig() {
486489
// Map uuid back to nvidia id
487490
for (auto const& uuid : uuids) {
488491
for (auto const& ngpu : nvidia_gpus) {
489-
if (uuid == ngpu.uuid) {
492+
if (ngpu.uuid.find(uuid) != std::string::npos) {
490493
res.push_back(std::stoi(ngpu.id));
491494
}
492495
}

engine/services/hardware_service.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
#include <stdint.h>
33
#include <string>
44
#include <vector>
5+
#include <mutex>
56

67
#include "common/hardware_config.h"
78
#include "database_service.h"
@@ -39,4 +40,5 @@ class HardwareService {
3940
private:
4041
std::shared_ptr<DatabaseService> db_service_ = nullptr;
4142
std::optional<cortex::hw::ActivateHardwareConfig> ahc_;
43+
std::mutex mtx_;
4244
};

engine/services/inference_service.cc

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,12 @@ cpp::result<void, InferResult> InferenceService::HandleChatCompletion(
2424
auto status = std::get<0>(ir)["status_code"].asInt();
2525
if (status != drogon::k200OK) {
2626
CTL_INF("Model is not loaded, start loading it: " << model_id);
27-
auto res = LoadModel(saved_models_.at(model_id));
28-
// ignore return result
27+
// For remote engine, we use the updated configuration
28+
if (engine_service_->IsRemoteEngine(engine_type)) {
29+
(void)model_service_.lock()->StartModel(model_id, {}, false);
30+
} else {
31+
(void)LoadModel(saved_models_.at(model_id));
32+
}
2933
}
3034
}
3135

@@ -38,7 +42,7 @@ cpp::result<void, InferResult> InferenceService::HandleChatCompletion(
3842
LOG_WARN << "Engine is not loaded yet";
3943
return cpp::fail(std::make_pair(stt, res));
4044
}
41-
45+
4246
if (!model_id.empty()) {
4347
if (auto model_service = model_service_.lock()) {
4448
auto metadata_ptr = model_service->GetCachedModelMetadata(model_id);
@@ -72,7 +76,6 @@ cpp::result<void, InferResult> InferenceService::HandleChatCompletion(
7276
}
7377
}
7478

75-
7679
CTL_DBG("Json body inference: " + json_body->toStyledString());
7780

7881
auto cb = [q, tool_choice](Json::Value status, Json::Value res) {
@@ -217,10 +220,9 @@ InferResult InferenceService::LoadModel(
217220
std::get<RemoteEngineI*>(engine_result.value())
218221
->LoadModel(json_body, std::move(cb));
219222
}
220-
if (!engine_service_->IsRemoteEngine(engine_type)) {
221-
auto model_id = json_body->get("model", "").asString();
222-
saved_models_[model_id] = json_body;
223-
}
223+
// Save model config to reload if needed
224+
auto model_id = json_body->get("model", "").asString();
225+
saved_models_[model_id] = json_body;
224226
return std::make_pair(stt, r);
225227
}
226228

engine/services/model_service.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,6 +1233,8 @@ cpp::result<std::optional<std::string>, std::string>
12331233
ModelService::MayFallbackToCpu(const std::string& model_path, int ngl,
12341234
int ctx_len, int n_batch, int n_ubatch,
12351235
const std::string& kv_cache_type) {
1236+
// TODO(sang) temporary disable this function
1237+
return std::nullopt;
12361238
assert(hw_service_);
12371239
auto hw_info = hw_service_->GetHardwareInfo();
12381240
assert(!!engine_svc_);

engine/services/model_source_service.cc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -475,14 +475,13 @@ ModelSourceService::AddCortexsoRepoBranch(const std::string& model_source,
475475

476476
void ModelSourceService::SyncModelSource() {
477477
while (running_) {
478-
std::this_thread::sleep_for(std::chrono::milliseconds(500));
478+
std::this_thread::sleep_for(std::chrono::milliseconds(100));
479479
auto now = std::chrono::system_clock::now();
480480
auto config = file_manager_utils::GetCortexConfig();
481481
auto last_check =
482482
std::chrono::system_clock::time_point(
483483
std::chrono::milliseconds(config.checkedForSyncHubAt)) +
484484
std::chrono::hours(1);
485-
486485
if (now > last_check) {
487486
CTL_DBG("Start to sync cortex.db");
488487

engine/utils/file_manager_utils.cc

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,15 @@
1717
#endif
1818

1919
namespace file_manager_utils {
20-
std::filesystem::path GetExecutableFolderContainerPath() {
20+
21+
std::filesystem::path GetExecutablePath() {
2122
#if defined(__APPLE__) && defined(__MACH__)
2223
char buffer[1024];
2324
uint32_t size = sizeof(buffer);
2425

2526
if (_NSGetExecutablePath(buffer, &size) == 0) {
2627
// CTL_DBG("Executable path: " << buffer);
27-
return std::filesystem::path{buffer}.parent_path();
28+
return std::filesystem::path{buffer};
2829
} else {
2930
CTL_ERR("Failed to get executable path");
3031
return std::filesystem::current_path();
@@ -35,7 +36,7 @@ std::filesystem::path GetExecutableFolderContainerPath() {
3536
if (len != -1) {
3637
buffer[len] = '\0';
3738
// CTL_DBG("Executable path: " << buffer);
38-
return std::filesystem::path{buffer}.parent_path();
39+
return std::filesystem::path{buffer};
3940
} else {
4041
CTL_ERR("Failed to get executable path");
4142
return std::filesystem::current_path();
@@ -44,13 +45,17 @@ std::filesystem::path GetExecutableFolderContainerPath() {
4445
wchar_t buffer[MAX_PATH];
4546
GetModuleFileNameW(NULL, buffer, MAX_PATH);
4647
// CTL_DBG("Executable path: " << buffer);
47-
return std::filesystem::path{buffer}.parent_path();
48+
return std::filesystem::path{buffer};
4849
#else
4950
LOG_ERROR << "Unsupported platform!";
5051
return std::filesystem::current_path();
5152
#endif
5253
}
5354

55+
std::filesystem::path GetExecutableFolderContainerPath() {
56+
return GetExecutablePath().parent_path();
57+
}
58+
5459
std::filesystem::path GetHomeDirectoryPath() {
5560
#ifdef _WIN32
5661
const wchar_t* homeDir = _wgetenv(L"USERPROFILE");

engine/utils/file_manager_utils.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ inline std::string cortex_config_file_path;
2020

2121
inline std::string cortex_data_folder_path;
2222

23+
std::filesystem::path GetExecutablePath();
24+
2325
std::filesystem::path GetExecutableFolderContainerPath();
2426

2527
std::filesystem::path GetHomeDirectoryPath();

0 commit comments

Comments
 (0)