[Transforms] Enable shared memory and introduce permutations #284

dsikka · 2025-03-26T21:56:37Z

Summary

1. Makes a series of updates to the registry by supporting matrix caching of the transform parameter

With these changes, instead of generating hadamards for a particular size repeatedly, cache the matrix using its size as the key. This significantly speeds up transform set-up as by leveraging the property of hadamards being deterministic
For randomized hadamards, splits up the math so that the underlying hadamard can be cached and the randomness is introduced as a separate permutation matrix [will need to add a few tests to make sure my math is correct]
For the matrix-multiply transform, caches based on a user provided name

2. Also uses shared memory, so that layers with identical transforms use the same underlying transform data.

Significantly reduces the memory required by transforms
NOTE: For training, we may need to make updates depending on how transform updates are expected to be made during training as we are now using shared data

3. Move update/register functionality to be done inside the registry; introduce permutation parameter

The update and register steps are identical as before but now happen inside the registry for slightly more clarity
Introduce the permutation parameter. Only being used for the random-hadamard for now but will follow up with the remaining transfoms as well

4. Swap global to be called "shared"

brian-dellabetta

i think this makes sense 👍

brian-dellabetta · 2025-03-27T18:01:48Z

src/compressed_tensors/quantization/lifecycle/apply.py

@@ -78,7 +79,7 @@ def load_transforms(model: Module, model_name_or_path: str):

    state_dict = {}
    for weight_name, safe_path in weight_mappings.items():
-        if "transform" in weight_name:
+        if "transform" in weight_name or "_perm_" in weight_name:


it seems like we have to do some sort of name matching, but i'm wondering if some name collision down the road is going to cause this to run when we don't want it? if we came up with a more unique name or something to prevent false positives

we have this problem with any parameter we introduce (e.g weight_scale, weight_g_idx, etc) but yeah, we can work on making them more unique

brian-dellabetta · 2025-03-27T18:03:23Z

src/compressed_tensors/transforms/base.py

    ):
        if module is None:
            self.transform.data.copy_(data)
+            if self.permutation is not None and permutation_data is not None:
+                self.permutation.data.copy_(permutation_data)


do we need update_parameter_data here too in case of offloading?

this is in the case if the parameter isn't registered to a module for whatever reason. update_parameter_data handles module_params. I'm not sure if this case is totally necessary but yeah, we would have to add offloading/onloading around it if we decide to keep it

brian-dellabetta · 2025-03-27T18:04:24Z

src/compressed_tensors/transforms/hadamard_utils.py

@@ -129,7 +135,7 @@ def _matmul_hadU(X, transpose=False):
        input = hadK.view(1, K, K).to(input) @ input

    # normalize


this comment might need to go too if we are not normalizing?

brian-dellabetta · 2025-03-27T18:21:34Z

src/compressed_tensors/transforms/utils.py

+__all__ = ["apply_matrix_transform", "SingletonMatrixRegistry"]
+
+
+class SingletonMatrixRegistry:


so that all matrices live in a single global key-value store, right?

yeah, we can expand this but it seems like there will be a lot of repetition across decoder layers for example
I think if this goes too big in scope, we may have to consider other data stores to handle it

src/compressed_tensors/transforms/hadamard.py

kylesayrs · 2025-04-28T13:18:16Z

src/compressed_tensors/transforms/utils.py

+__all__ = ["apply_matrix_transform", "SingletonMatrixRegistry"]
+
+
+class SingletonMatrixRegistry:


Is a new class singleton necessary? Since each matrix Transform requires own registry, shouldn't this be implemented on the class itself?

class Hadamard(Transforms): registry: Dict[int, torch.Tensor] = {} def __new__(cls, size, empty, transform_name, *args, **kwargs): if empty: matrix = ... else: matrix = cls.registry.get(size, torch.Tensor(deterministic_hadamard_matrix(size=self.size))) return super().__new__(transform=matrix, transform_name=transform_name)

kylesayrs · 2025-04-28T13:58:03Z

src/compressed_tensors/transforms/base.py

-            self.transform = torch.nn.Buffer(transform.to(dtype).to(device))
+        self.transform = torch.nn.Parameter(transform, requires_grad=False)
+        self.transform_name = transform_name
+        self.permutation = (


For randomized hadamards, splits up the math so that the underlying hadamard can be cached and the randomness is introduced as a separate permutation matrix

Since permutations are specific to RandomHadamards, shouldn't we be implementing this logic on the RandomHadamard class, not the general Transforms class?

dsikka added 5 commits March 23, 2025 23:05

update

0be4680

update random

097caf0

update

b7f5c83

clean-up

9ff4eda

fix permutation weight loading

ff50ef3

dsikka marked this pull request as ready for review March 26, 2025 23:10

brian-dellabetta approved these changes Mar 27, 2025

View reviewed changes

kylesayrs reviewed Apr 28, 2025

View reviewed changes

dsikka mentioned this pull request May 5, 2025

[Transforms] Initial Implementation #277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Transforms] Enable shared memory and introduce permutations #284

[Transforms] Enable shared memory and introduce permutations #284

Uh oh!

dsikka commented Mar 26, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

brian-dellabetta Mar 27, 2025

Uh oh!

dsikka Mar 27, 2025

Uh oh!

brian-dellabetta Mar 27, 2025

Uh oh!

dsikka Mar 27, 2025

Uh oh!

brian-dellabetta Mar 27, 2025

Uh oh!

brian-dellabetta Mar 27, 2025

Uh oh!

dsikka Mar 27, 2025

Uh oh!

Uh oh!

kylesayrs Apr 28, 2025

Uh oh!

kylesayrs Apr 28, 2025

Uh oh!

Uh oh!

		@@ -129,7 +135,7 @@ def _matmul_hadU(X, transpose=False):
		input = hadK.view(1, K, K).to(input) @ input

		# normalize

		__all__ = ["apply_matrix_transform", "SingletonMatrixRegistry"]


		class SingletonMatrixRegistry:

[Transforms] Enable shared memory and introduce permutations #284

Are you sure you want to change the base?

[Transforms] Enable shared memory and introduce permutations #284

Uh oh!

Conversation

dsikka commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Makes a series of updates to the registry by supporting matrix caching of the transform parameter

2. Also uses shared memory, so that layers with identical transforms use the same underlying transform data.

3. Move update/register functionality to be done inside the registry; introduce permutation parameter

4. Swap global to be called "shared"

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka commented Mar 26, 2025 •

edited

Loading