You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The head layer is typically the largest layer of the model and it's not uncommon for that to be the point where you run out of VRAM, sadly. During conversion, the quantizer needs to store three copies of the full tensor in FP32 precision, which on its own is a total of 7.5 GB for that model, in addition to everything else the framework is doing. So I'm afraid that the way it's currently implemented, 11 GB just isn't enough VRAM to convert this model.
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.4.0
Model
LatitudeGames/Wayfarer-12B
Describe the bug
When using a 2080ti (11GB VRAM) it gets all the way to the save stage of the measurements and then crashes from running out of memory.
How much VRAM is required to do 12B models?
I'm trying to find a use for my 2080ti, other than sitting around collecting dust.
Reproduction steps
.
Expected behavior
.
Logs
.
Additional context
.
Acknowledgements
The text was updated successfully, but these errors were encountered: