You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using `torch.as_tensor` we can alias the tensor rather than copy during
gguf file loading. This avoids duplicating the entire tensor contents
when tracing torch programs which substrantially decreases memory usage
on large models.
e.g. LLaMa 70b decreased memory allocation from 60+GB to 2 GB for
tensors.
0 commit comments