Release Compressed Tensors v0.13.0 · vllm-project/compressed-tensors

What's Changed

[Quantization] Refactor initialize for activation shape inference by @kylesayrs in #476
Add block strategy and structure validation by @kylesayrs in #483
[Tests] Mock Observers, Static Lifecycle Tests by @kylesayrs in #482
[Attention] Attention head quantization strategy by @kylesayrs in #481
drop python 3.9 and add 3.13 to testing by @dhuangnm in #486
Remove static token quantization by @kylesayrs in #487
[Misc] Remove unused config name definitions by @kylesayrs in #332
Remove unused find_name_or_class_matches util by @kylesayrs in #488
Update NVFP4 default observer by @dsikka in #493
Switch test runners to use the vllm runners by @dhuangnm in #496
Tensor Group Validation by @kylesayrs in #490
Update neuralmagic --> vllm-project for links by @mgoin in #495
one more place to update the runner by @dhuangnm in #497
update to allow READ only access by @andy-neuma in #499
Update workflows to use new vllm infra by @dhuangnm in #500
Remove FP8_DTYPE; use FP8_E4M3_DATA instead by @dsikka in #501
[MXFP4] Add MXFP4 Compressor by @dsikka in #502
[Transform] Attention/Cache transforms by @kylesayrs in #436
[MXFP4] Add scale generation utils by @dsikka in #503
Update error message for column/group_size mismatch by @HDCharles in #505
Fixing bug in matrix_multiply.py by @HDCharles in #507
feat: support zero-point decompression for asymmetric quantization (packed) by @Etelis in #463
[Attention] R3 Attention Transform by @kylesayrs in #485
[Quantization Args] Add scale and zp dtype by @dsikka in #508
Switch to use h100 runner and remove nightly related workflows by @dhuangnm in #515
[Quant Args] Clean-up by @dsikka in #513
[Tests] Small Fixes by @dsikka in #516
Fix dtype by @dsikka in #517
patch_attrs helper by @brian-dellabetta in #519
fix match_modules_set to work with MoE by @HDCharles in #524
[MXFP4] Add calibration support by @dsikka in #509
fix qparams decompression by @shanjiaz in #514
Revert "fix qparams decompression (#514)" by @dsikka in #527
Update quantize_and_pack_int4.ipynb to use compress_model; remove compress_quantized_weights by @zkl-ai in #526

New Contributors

@andy-neuma made their first contribution in #499
@HDCharles made their first contribution in #505
@Etelis made their first contribution in #463
@zkl-ai made their first contribution in #526

Full Changelog: 0.12.2...0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed Tensors v0.13.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!