What's Changed
- [Quantization] Refactor initialize for activation shape inference by @kylesayrs in #476
- Add block strategy and structure validation by @kylesayrs in #483
- [Tests] Mock Observers, Static Lifecycle Tests by @kylesayrs in #482
- [Attention] Attention head quantization strategy by @kylesayrs in #481
- drop python 3.9 and add 3.13 to testing by @dhuangnm in #486
- Remove static token quantization by @kylesayrs in #487
- [Misc] Remove unused config name definitions by @kylesayrs in #332
- Remove unused
find_name_or_class_matchesutil by @kylesayrs in #488 - Update NVFP4 default observer by @dsikka in #493
- Switch test runners to use the vllm runners by @dhuangnm in #496
- Tensor Group Validation by @kylesayrs in #490
- Update neuralmagic --> vllm-project for links by @mgoin in #495
- one more place to update the runner by @dhuangnm in #497
- update to allow READ only access by @andy-neuma in #499
- Update workflows to use new vllm infra by @dhuangnm in #500
- Remove FP8_DTYPE; use FP8_E4M3_DATA instead by @dsikka in #501
- [MXFP4] Add MXFP4 Compressor by @dsikka in #502
- [Transform] Attention/Cache transforms by @kylesayrs in #436
- [MXFP4] Add scale generation utils by @dsikka in #503
- Update error message for column/group_size mismatch by @HDCharles in #505
- Fixing bug in matrix_multiply.py by @HDCharles in #507
- feat: support zero-point decompression for asymmetric quantization (packed) by @Etelis in #463
- [Attention] R3 Attention Transform by @kylesayrs in #485
- [Quantization Args] Add scale and zp dtype by @dsikka in #508
- Switch to use h100 runner and remove nightly related workflows by @dhuangnm in #515
- [Quant Args] Clean-up by @dsikka in #513
- [Tests] Small Fixes by @dsikka in #516
- Fix dtype by @dsikka in #517
- patch_attrs helper by @brian-dellabetta in #519
- fix match_modules_set to work with MoE by @HDCharles in #524
- [MXFP4] Add calibration support by @dsikka in #509
- fix qparams decompression by @shanjiaz in #514
- Revert "fix qparams decompression (#514)" by @dsikka in #527
- Update quantize_and_pack_int4.ipynb to use compress_model; remove compress_quantized_weights by @zkl-ai in #526
New Contributors
- @andy-neuma made their first contribution in #499
- @HDCharles made their first contribution in #505
- @Etelis made their first contribution in #463
- @zkl-ai made their first contribution in #526
Full Changelog: 0.12.2...0.13.0