Question regarding loading model data from model GGUF file to Main Memory #12455
akapoor3518
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Currently we have Main memory limitation for our GPU. Its only limited to 1GB. Can we use Bigger model with our Custom backend. Some larger model need more than 1GB of memory do we have to load all tensors during llama_init_from_model or we do before particular compute. I understand this is not best performance but for now we are only looking for functionality. Soon our Memory Constrain issue will resolve that time we can look for performance and do proper graph planning.
Thanks,
Beta Was this translation helpful? Give feedback.
All reactions