[BUG] 2080ti can't quant a 12B model #725

frenzybiscuit · 2025-01-28T00:01:34Z

OS

Linux

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.4.0

Model

LatitudeGames/Wayfarer-12B

Describe the bug

When using a 2080ti (11GB VRAM) it gets all the way to the save stage of the measurements and then crashes from running out of memory.

How much VRAM is required to do 12B models?

I'm trying to find a use for my 2080ti, other than sitting around collecting dust.

Reproduction steps

.

Expected behavior

.

Logs

.

Additional context

.

Acknowledgements

I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

turboderp · 2025-02-02T13:31:50Z

The head layer is typically the largest layer of the model and it's not uncommon for that to be the point where you run out of VRAM, sadly. During conversion, the quantizer needs to store three copies of the full tensor in FP32 precision, which on its own is a total of 7.5 GB for that model, in addition to everything else the framework is doing. So I'm afraid that the way it's currently implemented, 11 GB just isn't enough VRAM to convert this model.

frenzybiscuit added the bug Something isn't working label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 2080ti can't quant a 12B model #725

[BUG] 2080ti can't quant a 12B model #725

frenzybiscuit commented Jan 28, 2025 •

edited

Loading

turboderp commented Feb 2, 2025

[BUG] 2080ti can't quant a 12B model #725

[BUG] 2080ti can't quant a 12B model #725

Comments

frenzybiscuit commented Jan 28, 2025 • edited Loading

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

turboderp commented Feb 2, 2025

frenzybiscuit commented Jan 28, 2025 •

edited

Loading