[Transforms] Apply, serialize, deserialize by dsikka · Pull Request #276 · vllm-project/compressed-tensors

dsikka · 2025-03-11T19:40:24Z

Summary

Add support to apply transforms to models during quantization, specifically targeting layer weights for now

Process

Includes processing the provided transforms config, generating a transform_data object to attach to each layer indicating runtime args, and attaching each of the individual transforms to the model layers.
Specifically, as multiple transforms can be applied to a particular parameter, to differentiate between them, an index value is attached to the parameter name, resulting in transform parameters having the following name convention:{parameter_type}_transform_{idx} e.g. "weight_transform_0" or "input_activation_transform_0" to try and match the convention of weights, input_activations, and output_activations.
TransformData includes a dictionary of all the transforms-relevant runtime data, and is attached to the layer as "transform_data" . The keys of the dictionary correspond to the transform parameter names. Note: in the future, if we decide to add another layer to infer runtime/call args on the fly, we can potentially remove TransformData but that is an optimization we can talk about in the future.

Apply

Utils have also been added to apply the transforms to the weights, when applying QDQ. This functionality will be further extended and likely removed from within the forward method as support for activation transforms is added . This is currently being handled by apply_transforms_to_parameter and apply_inverse_transforms_to_parameter which sandwich QDQ

Serialize/Deserialize

Serialization currently does not compress the transforms and saves them to disk uncompressed (we will either fuse these in or compress them in a follow-up). The quantization_config is also extended with a transforms_config in config.json
For deserialization, an additional load_transforms function has been added to load the parameters from disk and add the relevant runtime information. However, this requires the above transformers PR indicated above

Examples:

# Apply a transform config to a model

from compressed_tensors.quantization import process_transforms_config

targets = ["Linear"]
module_targets = [ModuleTarget.WEIGHT]
linear_layer_args = TransformationArgs(
    targets=targets, module_targets=module_targets
)

scheme = TransformationScheme(
    transform_type="hadamard",
    groups=[linear_layer_args],
    transform_creation_args={"size": 512},
)
config = TransformationConfig(
    transform_groups={
        "transform_0": scheme,
    }
)

model = torch.nn.Linear(512, 512)

model = process_transforms_config(model=model, transforms_config=config)

# Once processed, the model will have the following parameters:
>> model.weight_transform_0
Parameter containing:
        tensor([[ 1.,  1.,  1.,  ...,  1.,  1.,  1.],
                [ 1., -1.,  1.,  ..., -1.,  1., -1.],
                [ 1.,  1., -1.,  ...,  1., -1., -1.],
                ...,
                [ 1., -1.,  1.,  ..., -1.,  1., -1.],
                [ 1.,  1., -1.,  ...,  1., -1., -1.],
                [ 1., -1., -1.,  ..., -1., -1.,  1.]], dtype=torch.bfloat16)

>> model.transform_data
TransformData(data={'weight_transform_0': 
      { 
          'call_args': defaultdict(),
          'type': "hadamard"
      }
  }

# Apply the transform. 
model(some_dummy_data)

Testing

This functionality (along with [Transforms] Transform Arg, Scheme, and Data support #275 and [Transforms] Transform Registry Support #274) has been tested using llm-compressor, to quantize and evaluate a Llama 3.1-1b model with the QuantizationModifier, showing performance close to or surpassing GPTQ on gsm8k

Dependencies:

Requires: [WIP] Add support to load models with transforms huggingface/transformers#36621 to be full functional but should be able to land before then

brian-dellabetta · 2025-03-11T22:39:58Z

src/compressed_tensors/quantization/lifecycle/apply.py



+def process_transforms_config(
+    transforms_config: TransformationConfig,


Another example where I think consistency in naming convention will help users. Something like this?

Suggested change

transforms_config: TransformationConfig,

transform_config: TransformConfig,

src/compressed_tensors/quantization/lifecycle/apply.py

kylesayrs · 2025-04-28T21:30:05Z

src/compressed_tensors/quantization/lifecycle/apply.py

+                state_dict[weight_name] = f.get_tensor(weight_name)
+
+    for name, submodule in iter_named_leaf_modules(model):
+        transform_data = getattr(submodule, "transform_data", None)


TransformData includes a dictionary of all the transforms-relevant runtime data, and is attached to the layer as "transform_data"

To me, it seems like this information is essentially duplicating the information of the scheme/args? Why not attach the scheme/args to the module, rather than creating a new abstraction?

brian-dellabetta · 2025-04-28T22:09:53Z

src/compressed_tensors/quantization/lifecycle/apply.py

            )


+def process_transforms_config(


I missed that this is actually modifying model registering transforms to it. The name process_transforms_config is pretty innocuous for something like this, given it's modifying model significantly. Consider renaming to add_transforms_to_model or something more explicit?

dsikka changed the title ~~add apply, serialize, deserialize support~~ [Transforms] Apply, serialize, deserialize Mar 11, 2025

dsikka mentioned this pull request Mar 11, 2025

[Transforms] Initial Implementation #277

Closed

dsikka marked this pull request as ready for review March 11, 2025 21:53

brian-dellabetta reviewed Mar 11, 2025

View reviewed changes

brian-dellabetta requested review from horheynm, kylesayrs and rahul-tuli March 17, 2025 16:54

rahul-tuli approved these changes Mar 18, 2025

View reviewed changes

src/compressed_tensors/quantization/lifecycle/apply.py Show resolved Hide resolved

dsikka mentioned this pull request Mar 20, 2025

[Transforms] Transform Registry Support #274

Closed

dsikka commented Mar 20, 2025

View reviewed changes

src/compressed_tensors/quantization/lifecycle/apply.py Outdated Show resolved Hide resolved

dsikka force-pushed the transform_arg_support branch from 358075b to fadaaf8 Compare March 22, 2025 21:02

dsikka force-pushed the transform_apply_support branch from e7cdea4 to 063d62d Compare March 22, 2025 21:06

dsikka force-pushed the transform_arg_support branch from fadaaf8 to 86e805d Compare March 23, 2025 02:28

dsikka added 3 commits March 23, 2025 02:29

add apply, serialize, deserialize support

adda166

update

05c4735

move loading step to transformers

579337d

dsikka force-pushed the transform_apply_support branch from 28c7af5 to 579337d Compare March 23, 2025 02:35

kylesayrs reviewed Apr 28, 2025

View reviewed changes

brian-dellabetta approved these changes Apr 28, 2025

View reviewed changes

kylesayrs marked this pull request as draft July 14, 2025 18:41

dsikka closed this Aug 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Transforms] Apply, serialize, deserialize #276

[Transforms] Apply, serialize, deserialize #276
dsikka wants to merge 3 commits intotransform_arg_supportfrom
transform_apply_support

dsikka commented Mar 11, 2025 •

edited

Loading

Uh oh!

brian-dellabetta Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

kylesayrs Apr 28, 2025 •

edited

Loading

Uh oh!

brian-dellabetta Apr 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def process_transforms_config(
		transforms_config: TransformationConfig,

	transforms_config: TransformationConfig,
	transform_config: TransformConfig,

Conversation

dsikka commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Process

Apply

Serialize/Deserialize

Examples:

Testing

Dependencies:

Uh oh!

brian-dellabetta Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kylesayrs Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dsikka commented Mar 11, 2025 •

edited

Loading

kylesayrs Apr 28, 2025 •

edited

Loading

brian-dellabetta Apr 28, 2025 •

edited

Loading