Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : replace reallocation to reuse vector #11722

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lexasub
Copy link
Contributor

@lexasub lexasub commented Feb 6, 2025

Good afternoon! I've found a way to optimize tensor transmission a bit further.
The improvement is approximately -10% in terms of time and -20% according to the profiler. The changes are still in draft form, as I'm unsure how much to allocate initially for the vector.
Perhaps we could extract the "maximum tensor size" from somewhere if it's not too large, or set a constant value—just not as a hardcoded magic number like I used for now (I picked one arbitrarily). I believe the changes are clear to everyone: there won't be any reallocation, whereas previously we were simply calling resize on an empty vector in the recv_msg function.
Also, i go doing experiments with batching sending))
@rgerganov This change is very minor, sorry for bothering you over such a small thing. On one hand, if I create a draft MR, it might go unnoticed. On the other hand, if I open it for merging, someone might accidentally merge my arbitrary constant.
Alse i see warn msg in rpc-server "check_node_graph_compatibility_and_refresh_copy_ops: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 2 1 1]" what is it?

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 6, 2025
@lexasub lexasub changed the title ggml : draft commit, replace reallocation to reuse vector ggml : replace reallocation to reuse vector Feb 6, 2025
@ggerganov
Copy link
Owner

I think the initial reserve value is not very important to be the most optimal. The vector size would settle after a few calls anyway.

@rgerganov
Copy link
Collaborator

The problem with this change is that it will make rpc-server consume a lot of RAM because the capacity of set_tensor_vec will be equal to the size of the largest tensor.

I think the proper way for speeding up the model transfer over the network is using hashes and local copies as I have explained here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants