Graph rebuild on every token? #13646

Closed

tlemo started this conversation in General

tlemo
May 20, 2025

I noticed that llama_context::decode() appears to rebuild the graph for every token:

https://github.com/ggml-org/llama.cpp/blob/master/src/llama-context.cpp#L955-L971

Why isn't the graph, including the splitting across backends, reused across tokens?

Replies: 1 comment

ggerganov
May 28, 2025
Maintainer

See #13816 (reply in thread)

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment