|
| 1 | +# `Getting Started with Intel® Neural Compressor for Quantization` Sample |
| 2 | + |
| 3 | +The sample is a getting started tutorial for the Intel® Neural Compressor (INC), and demonstrates how to perform INT8 quantization on a Hugging Face BERT model. This sample shows how to achieve performance boosts using Intel hardware. |
| 4 | + |
| 5 | +| Area | Description |
| 6 | +|:--- |:--- |
| 7 | +| What you will learn | How to quantize a BERT model using Intel® Neural Compressor |
| 8 | +| Time to complete | 20 minutes |
| 9 | +| Category | Code Optimization |
| 10 | + |
| 11 | +## Purpose |
| 12 | + |
| 13 | +Intel® Neural Compressor comes with many options for deep learning model compression, one of them being INT8 Quantization. Quantization help to reduce the size of the model, which enables faster inference. The approach requires a trade-off in reduced accuracy for the reduced size; however, Intel® Neural Compressor provides automated accuracy-driven tuning recipes that will allow you to quantize your model and maintain your model accuracy goals. |
| 14 | + |
| 15 | +The sample starts by loading a BERT model from Hugging Face. After loading the model, we set up an evaluation function that we care about using PyTorch* Dataset and DataLoader classes. Using this evaluation function, Intel® Neural Compressor can perform both post training static and dynamic quantization to achieve the speedups. |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +| Optimized for | Description |
| 20 | +|:--- |:--- |
| 21 | +| OS | Ubuntu* 20.04 (or newer) |
| 22 | +| Hardware | Intel® Xeon® Scalable processor family |
| 23 | +| Software | Intel® AI Analytics Toolkit (AI Kit) |
| 24 | + |
| 25 | +### For Local Development Environments |
| 26 | + |
| 27 | +You will need to download and install the following toolkits, tools, and components to use the sample. |
| 28 | + |
| 29 | +- **Intel® AI Analytics Toolkit (AI Kit)** |
| 30 | + |
| 31 | + You can get the AI Kit from [Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#analytics-kit). <br> See [*Get Started with the Intel® AI Analytics Toolkit for Linux**](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux) for AI Kit installation information and post-installation steps and scripts. |
| 32 | + |
| 33 | +- **Jupyter Notebook** |
| 34 | + |
| 35 | + Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions. |
| 36 | + |
| 37 | +### For Intel® DevCloud |
| 38 | + |
| 39 | +The necessary tools and components are already installed in the environment. You do not need to install additional components. See [Intel® DevCloud for oneAPI](https://devcloud.intel.com/oneapi/get_started/) for information. |
| 40 | + |
| 41 | +### **Additional Packages** |
| 42 | + |
| 43 | + You will need to install these additional packages in *requirements.txt*. |
| 44 | + ``` |
| 45 | + python -m pip install -r requirements.txt |
| 46 | + ``` |
| 47 | + |
| 48 | +## Key Implementation Details |
| 49 | + |
| 50 | +The sample contains one Jupyter Notebook and one Python script. |
| 51 | + |
| 52 | +### Jupyter Notebook |
| 53 | + |
| 54 | +|Notebook |Description |
| 55 | +|:--- |:--- |
| 56 | +|`quantize_with_inc.ipynb` | Get started tutorial for using Intel® Neural Compressor for PyTorch* |
| 57 | + |
| 58 | +### Python Script |
| 59 | + |
| 60 | +|Script |Description |
| 61 | +|:--- |:--- |
| 62 | +|`dataset.py` | The script provides a PyTorch* Dataset class that tokenizes text data |
| 63 | + |
| 64 | + |
| 65 | +## Run the `Getting Started with Intel® Neural Compressor for Quantization` Sample |
| 66 | + |
| 67 | +> **Note**: If you have not already done so, set up your CLI |
| 68 | +> environment by sourcing the `setvars` script in the root of your oneAPI installation. |
| 69 | +> |
| 70 | +> Linux*: |
| 71 | +> - For system wide installations: `. /opt/intel/oneapi/setvars.sh` |
| 72 | +> - For private installations: ` . ~/intel/oneapi/setvars.sh` |
| 73 | +> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'` |
| 74 | +> |
| 75 | +> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html). |
| 76 | +
|
| 77 | +### On Linux* |
| 78 | + |
| 79 | +#### Activate Conda |
| 80 | + |
| 81 | +1. Activate the Conda environment. |
| 82 | + |
| 83 | + ``` |
| 84 | + conda activate pytorch |
| 85 | + ``` |
| 86 | +
|
| 87 | + By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it. |
| 88 | +
|
| 89 | + You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following. |
| 90 | +
|
| 91 | +#### Run the NoteBook |
| 92 | +
|
| 93 | +1. Launch Jupyter Notebook. |
| 94 | + ``` |
| 95 | + jupyter notebook --ip=0.0.0.0 |
| 96 | + ``` |
| 97 | +2. Follow the instructions to open the URL with the token in your browser. |
| 98 | +3. Locate and select the Notebook. |
| 99 | + ``` |
| 100 | + optimize_pytorch_models_with_ipex.ipynb |
| 101 | + ``` |
| 102 | +4. Change the kernel to **pytorch**. |
| 103 | +5. Run every cell in the Notebook in sequence. |
| 104 | +
|
| 105 | +#### Troubleshooting |
| 106 | +
|
| 107 | +If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility. |
| 108 | +
|
| 109 | +### Run the Sample on Intel® DevCloud (Optional) |
| 110 | +
|
| 111 | +1. If you do not already have an account, request an Intel® DevCloud account at [*Create an Intel® DevCloud Account*](https://intelsoftwaresites.secure.force.com/DevCloud/oneapi). |
| 112 | +2. On a Linux* system, open a terminal. |
| 113 | +3. SSH into Intel® DevCloud. |
| 114 | + ``` |
| 115 | + ssh DevCloud |
| 116 | + ``` |
| 117 | + > **Note**: You can find information about configuring your Linux system and connecting to Intel DevCloud at Intel® DevCloud for oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started). |
| 118 | +
|
| 119 | +4. Follow the instructions to open the URL with the token in your browser. |
| 120 | +3. Locate and select the Notebook. |
| 121 | + ``` |
| 122 | + quantize_with_inc.ipynb |
| 123 | + ``` |
| 124 | +4. Change the kernel to **PyTorch (AI Kit)**. |
| 125 | +7. Run every cell in the Notebook in sequence. |
| 126 | +
|
| 127 | +## Example Output |
| 128 | +
|
| 129 | +You should see an image showing the performance comparison and analysis between FP32 and INT8. |
| 130 | +
|
| 131 | +>**Note**: The image shown below is an example of a general performance comparison for inference speedup obtained by quantization. (Your results might be different.) |
| 132 | +
|
| 133 | + |
| 134 | +
|
| 135 | +## License |
| 136 | +
|
| 137 | +Code samples are licensed under the MIT license. See |
| 138 | +[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. |
| 139 | +
|
| 140 | +Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). |
0 commit comments