Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I noticed that
llama_context::decode()
appears to rebuild the graph for every token:https://github.com/ggml-org/llama.cpp/blob/master/src/llama-context.cpp#L955-L971
Why isn't the graph, including the splitting across backends, reused across tokens?
Beta Was this translation helpful? Give feedback.
All reactions