ggml : replace reallocation to reuse vector #11722
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Good afternoon! I've found a way to optimize tensor transmission a bit further.
The improvement is approximately -10% in terms of time and -20% according to the profiler. The changes are still in draft form, as I'm unsure how much to allocate initially for the vector.
Perhaps we could extract the "maximum tensor size" from somewhere if it's not too large, or set a constant value—just not as a hardcoded magic number like I used for now (I picked one arbitrarily). I believe the changes are clear to everyone: there won't be any reallocation, whereas previously we were simply calling resize on an empty vector in the recv_msg function.
Also, i go doing experiments with batching sending))
@rgerganov This change is very minor, sorry for bothering you over such a small thing. On one hand, if I create a draft MR, it might go unnoticed. On the other hand, if I open it for merging, someone might accidentally merge my arbitrary constant.
Alse i see warn msg in rpc-server "check_node_graph_compatibility_and_refresh_copy_ops: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 2 1 1]" what is it?