Skip to content

Commit e051c37

Browse files
authored
Alias gguf tensors instead of copy (#167)
Using `torch.as_tensor` we can alias the tensor rather than copy during gguf file loading. This avoids duplicating the entire tensor contents when tracing torch programs which substrantially decreases memory usage on large models. e.g. LLaMa 70b decreased memory allocation from 60+GB to 2 GB for tensors.
1 parent 547ced4 commit e051c37

File tree

1 file changed

+3
-2
lines changed
  • sharktank/sharktank/types/gguf_interop

1 file changed

+3
-2
lines changed

sharktank/sharktank/types/gguf_interop/base.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,10 @@ def _externalize_tensor(
8080
# Important: The annotation tag must be set on the actual leaf tensor
8181
# which is stored in the root theta. This means that any shaping or
8282
# data type massaging has to happen *before* annotating.
83-
data_tensor = torch.tensor(data)
8483
if logical_shape is not None:
85-
data_tensor = data_tensor.reshape(logical_shape)
84+
data_tensor = torch.as_tensor(data.reshape(logical_shape))
85+
else:
86+
data_tensor = torch.as_tensor(data)
8687
ExternalTensorTrait(external_name=name, external_scope="").set(data_tensor)
8788
return data_tensor
8889

0 commit comments

Comments
 (0)