ggml : replace reallocation to reuse vector #11722

lexasub · 2025-02-06T20:37:29Z

Good afternoon! I've found a way to optimize tensor transmission a bit further.
The improvement is approximately -10% in terms of time and -20% according to the profiler. The changes are still in draft form, as I'm unsure how much to allocate initially for the vector.
Perhaps we could extract the "maximum tensor size" from somewhere if it's not too large, or set a constant value—just not as a hardcoded magic number like I used for now (I picked one arbitrarily). I believe the changes are clear to everyone: there won't be any reallocation, whereas previously we were simply calling resize on an empty vector in the recv_msg function.
Also, i go doing experiments with batching sending))
@rgerganov This change is very minor, sorry for bothering you over such a small thing. On one hand, if I create a draft MR, it might go unnoticed. On the other hand, if I open it for merging, someone might accidentally merge my arbitrary constant.
Alse i see warn msg in rpc-server "check_node_graph_compatibility_and_refresh_copy_ops: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 2 1 1]" what is it?

ggerganov · 2025-02-07T14:07:33Z

I think the initial reserve value is not very important to be the most optimal. The vector size would settle after a few calls anyway.

rgerganov · 2025-02-10T09:34:59Z

The problem with this change is that it will make rpc-server consume a lot of RAM because the capacity of set_tensor_vec will be equal to the size of the largest tensor.

I think the proper way for speeding up the model transfer over the network is using hashes and local copies as I have explained here

… reserve inital vector and use it

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 6, 2025

lexasub changed the title ~~ggml : draft commit, replace reallocation to reuse vector~~ ggml : replace reallocation to reuse vector Feb 6, 2025

lexasub mentioned this pull request Feb 15, 2025

Eval bug: very slow inference on DeepSeek-R1-Distill-Qwen-32B #11361

Closed

lexasub force-pushed the reduce-reallocation branch from 01feb09 to 1c8116c Compare February 24, 2025 17:06

ggml : draft commit, replace reallocation of vector for set_tensor by…

355c1fb

… reserve inital vector and use it

lexasub force-pushed the reduce-reallocation branch from 1c8116c to 355c1fb Compare February 24, 2025 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : replace reallocation to reuse vector #11722

ggml : replace reallocation to reuse vector #11722

Uh oh!

lexasub commented Feb 6, 2025 •

edited

Loading

Uh oh!

ggerganov commented Feb 7, 2025

Uh oh!

rgerganov commented Feb 10, 2025

Uh oh!

Uh oh!

ggml : replace reallocation to reuse vector #11722

Are you sure you want to change the base?

ggml : replace reallocation to reuse vector #11722

Uh oh!

Conversation

lexasub commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Feb 7, 2025

Uh oh!

rgerganov commented Feb 10, 2025

Uh oh!

Uh oh!

lexasub commented Feb 6, 2025 •

edited

Loading