MyQuantLib

A native and hackable neural network quantization library based on pytorch for research purposes.

Author: Evans Li Contact: liangkeshulizi@gmail.com

Introduction

Neural network quantization is a powerful technique and hot research topic that drastically reduces the memory and computational requirements to run a model by representing weights and activations with lower precision, such as 8-bit integers instead of 32-bit floating-point numbers. This makes models more efficient, especially on resource-constrained devices like mobile phones and edge devices.

Pytorch provides native API for quantization (Eager Mode & Graph FX), and they uses quantization libraries like qnnpack and fbgemm for inference under the hood. However, those quantization libraries are mostly written in C or Assembly, some are close-source, making it hard to understand what's actually happenning, impossible to tinker with the algorithm. Using them with custom backends (hardware accelerators) means heavy reliance on these libraries.

MyQuantLib implements the quantization algorithm and quantized inference in plain Pytorch with naive torch.int8. It makes it easy to understand how quantied layers are computed, implement it on custom backends (hardware accelerators) and tinker with different algorithm. Besides, it innovatively uses an dry-run pass to connect quantization parameters (qparams) between layers and pre-compute all the paramters before the real forward pass, improving performance and eliminating any need of floating-point computation (especially bias) during inference, making it possible to be implemented on a hardware without floating-point support.

MyQuantLib supports 8-bit Static Post-Training Quantization (PTQ). This is a method where quantization occurs after the model is trained, without requiring retraining. It works by analyzing the activations of the pre-trained model on a representative dataset to gather statistics like the min and max values. These statistics are used to determine the scaling factors for the weights and activations.

This package is a side project of one of my undergrad research projects. It is inspired by gemmlowp and Jacob et al., 2018.

Dynamic Quantization, variable bit quantization and more operations will be added in the near future.

Installation

To install the project from PyPI, run:

pip install py-myquantlib

Or if you want to clone the repository directly from github and build locally (recommended), run:

git clone https://github.com/liangkeshulizi/myquantlib.git
pip install -e /path/to/myquantlib

To run the example, cd into the repository and run:

cd myquantlib
python example.py

This will train and quantize an example CNN model on your computer. For detailed information about the usage, see example.py

Contribution

All kinds of contributions are welcome. Please open an issue before pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
myquantlib		myquantlib
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
data_utils.py		data_utils.py
example.py		example.py
example_qat.py		example_qat.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MyQuantLib

Introduction

Installation

Contribution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MyQuantLib

Introduction

Installation

Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages