How to(Can I) load a bigger model with Metal build? #3157
Replies: 3 comments 1 reply
-
Short answer - no |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
There's #2182 which can let you adjust the unified memory split. Assuming your unified memory is close to your total memory, you'd pretty much only run into the limit if the model was bigger your available memory. At that point, it's getting read off the disk per token and that's way more likely to be the bottleneck than anything else. So even if you run the actual calculations on the GPU rather than CPU you probably won't see much of a performance increase - it's going to be spending most of its time waiting on IO. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
On my MacBook M2 I tried Metal and CPU.
Metal is very fast.
But it can't work with large models.
ggml_metal_graph_compute: command buffer 0 failed with status 5
Can I load part of the weight into GPU buffers and use CPU to compute the rest?
Is this supported right now?
If not will it be faster than CPU?
Beta Was this translation helpful? Give feedback.
All reactions