Skip to content

server : use std::move whenever possible #12936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 18, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 14, 2025

Ref comment : #12898 (comment)

This PR is made by simply adding std::unique_ptr<int> dummy to the struct that we want to investigate, then remove that unique_ptr after all std::move being added.

@@ -2105,7 +2105,7 @@ struct server_context {
return true;
}

bool launch_slot_with_task(server_slot & slot, const server_task & task) {
bool launch_slot_with_task(server_slot & slot, const server_task && task) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool launch_slot_with_task(server_slot & slot, const server_task && task) {
bool launch_slot_with_task(server_slot & slot, server_task && task) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also move the params and prompt_tokens from task to slot, so I think this is currently not possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is not possible? With the const present, the moves of params and prompt_tokens are actually copies. Removing the const from the argument will make them proper moves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the prompt_tokens will soon contains image tokens as unique_ptr, so it will require a proper move.

And even without image (the current case where we have only text tokens), I think a proper move still make sense here. This funtion is the last place where the task want to go (i.e. we expect task to be destroyed here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now looking back at this, I realized that I'm thinking the reverted (I thought that you want to add the const) ; Sorry for the confusion 🥲

Comment on lines 3636 to 3642
server_task task(SERVER_TASK_TYPE_METRICS);
task.id = ctx_server.queue_tasks.get_new_id();
ctx_server.queue_results.add_waiting_task_id(task.id);
ctx_server.queue_tasks.post(task, true); // high-priority task
ctx_server.queue_tasks.post(std::move(task), true); // high-priority task

// get the result
server_task_result_ptr result = ctx_server.queue_results.recv(task.id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task should not be referenced after being moved. We can make this safer like this:

        const auto id = ctx_server.queue_tasks.get_new_id();

        {
            server_task task(SERVER_TASK_TYPE_METRICS);
            task.id = id;
            ctx_server.queue_results.add_waiting_task_id(task.id);
            ctx_server.queue_tasks.post(std::move(task), true); // high-priority task
        }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good idea. I also spotted some places where task is used after the move. It should be fixed in my last commit: 9487165

Comment on lines 1568 to 1570
queue_tasks.push_front(task);
} else {
queue_tasks.push_back(std::move(task));
queue_tasks.push_back(task);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without these moves, the task will be copied here. Better to keep them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, seems like I mistakenly removed this move. Added them back in 240ea24

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few updates and good to merge:

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index 528d4e6a..c580ec12 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1563,14 +1563,15 @@ struct server_queue {
         if (task.type == SERVER_TASK_TYPE_CANCEL) {
             cleanup_pending_task(task.id_target);
         }
-        QUE_DBG("new task, id = %d, front = %d\n", task.id, front);
+        const int task_id = task.id;
+        QUE_DBG("new task, id = %d, front = %d\n", task_id, front);
         if (front) {
             queue_tasks.push_front(std::move(task));
         } else {
             queue_tasks.push_back(std::move(task));
         }
         condition_tasks.notify_one();
-        return task.id;
+        return task_id;
     }
 
     // multi-task version of post()
@@ -2105,7 +2106,7 @@ struct server_context {
         return true;
     }
 
-    bool launch_slot_with_task(server_slot & slot, const server_task && task) {
+    bool launch_slot_with_task(server_slot & slot, server_task && task) {
         slot.reset();
         slot.id_task       = task.id;
         slot.index         = task.index;
@@ -2113,10 +2114,10 @@ struct server_context {
         slot.params        = std::move(task.params);
         slot.prompt_tokens = std::move(task.prompt_tokens);
 
-        if (!are_lora_equal(task.params.lora, slot.lora)) {
+        if (!are_lora_equal(slot.params.lora, slot.lora)) {
             // if lora is changed, we cannot reuse cached tokens
             slot.cache_tokens.clear();
-            slot.lora = task.params.lora;
+            slot.lora = slot.params.lora;
         }
 
         bool can_detokenize = can_be_detokenized(ctx, slot.prompt_tokens);
@@ -3952,44 +3953,42 @@ int main(int argc, char ** argv) {
 
         auto completion_id = gen_chatcmplid();
         std::unordered_set<int> task_ids;
-        {
+        try {
             std::vector<server_task> tasks;
 
-            try {
-                const auto & prompt = data.at("prompt");
-                // TODO: this log can become very long, put it behind a flag or think about a more compact format
-                //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
-
-                std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.vocab, prompt, true, true);
-                tasks.reserve(tokenized_prompts.size());
-                for (size_t i = 0; i < tokenized_prompts.size(); i++) {
-                    server_task task = server_task(type);
-
-                    task.id    = ctx_server.queue_tasks.get_new_id();
-                    task.index = i;
-
-                    task.prompt_tokens    = std::move(tokenized_prompts[i]);
-                    task.params           = server_task::params_from_json_cmpl(
-                                                ctx_server.ctx,
-                                                ctx_server.params_base,
-                                                data);
-                    task.id_selected_slot = json_value(data, "id_slot", -1);
-
-                    // OAI-compat
-                    task.params.oaicompat                 = oaicompat;
-                    task.params.oaicompat_cmpl_id         = completion_id;
-                    // oaicompat_model is already populated by params_from_json_cmpl
-
-                    tasks.push_back(std::move(task));
-                }
-            } catch (const std::exception & e) {
-                res_error(res, format_error_response(e.what(), ERROR_TYPE_INVALID_REQUEST));
-                return;
+            const auto & prompt = data.at("prompt");
+            // TODO: this log can become very long, put it behind a flag or think about a more compact format
+            //SRV_DBG("Prompt: %s\n", prompt.is_string() ? prompt.get<std::string>().c_str() : prompt.dump(2).c_str());
+
+            std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.vocab, prompt, true, true);
+            tasks.reserve(tokenized_prompts.size());
+            for (size_t i = 0; i < tokenized_prompts.size(); i++) {
+                server_task task = server_task(type);
+
+                task.id    = ctx_server.queue_tasks.get_new_id();
+                task.index = i;
+
+                task.prompt_tokens    = std::move(tokenized_prompts[i]);
+                task.params           = server_task::params_from_json_cmpl(
+                        ctx_server.ctx,
+                        ctx_server.params_base,
+                        data);
+                task.id_selected_slot = json_value(data, "id_slot", -1);
+
+                // OAI-compat
+                task.params.oaicompat                 = oaicompat;
+                task.params.oaicompat_cmpl_id         = completion_id;
+                // oaicompat_model is already populated by params_from_json_cmpl
+
+                tasks.push_back(std::move(task));
             }
 
             task_ids = server_task::get_list_id(tasks);
             ctx_server.queue_results.add_waiting_tasks(tasks);
             ctx_server.queue_tasks.post(std::move(tasks));
+        } catch (const std::exception & e) {
+            res_error(res, format_error_response(e.what(), ERROR_TYPE_INVALID_REQUEST));
+            return;
         }
 
         bool stream = json_value(data, "stream", false);

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 18, 2025

Thanks @ggerganov for helping! I underestimated this change. Applied your suggestions on my last commit 👍

@ngxson ngxson merged commit 35370ba into ggml-org:master Apr 18, 2025
48 of 51 checks passed
colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 21, 2025
* server : use std::move whenever possible

* use r-value ref

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* make task creation scoped

* restore std::move

* fix task_id not set correctly

* apply changes from suggestion

Co-authored-by: ggerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants