From 8f84b41953a03e61e30b3eb765f336aa1763e9c0 Mon Sep 17 00:00:00 2001 From: Karol Blaszczak Date: Thu, 6 Feb 2025 11:18:58 +0100 Subject: [PATCH 1/2] [DOCS] Updating references to OV docs (#3250) Updating links to refer to 2025 version of docs Co-authored-by: sgolebiewski-intel --- README.md | 2 +- docs/Installation.md | 2 +- docs/ModelZoo.md | 2 +- .../weights_compression/Usage.md | 6 +++--- .../torch/sparsity/movement/MovementSparsity.md | 8 ++++---- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 65bbb7f3e5c..32ee76e4106 100644 --- a/README.md +++ b/README.md @@ -514,4 +514,4 @@ You can opt-out at any time by running the following command in the Python envir `opt_in_out --opt_out` -More information available on [OpenVINO telemetry](https://docs.openvino.ai/2024/about-openvino/additional-resources/telemetry.html). +More information available on [OpenVINO telemetry](https://docs.openvino.ai/2025/about-openvino/additional-resources/telemetry.html). diff --git a/docs/Installation.md b/docs/Installation.md index 7130e784762..d03d7f76191 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -5,7 +5,7 @@ We suggest to install or use the package in the [Python virtual environment](htt NNCF supports multiple backends. Follow the corresponding installation guides and ensure your system meets the required specifications for your chosen backend: -- OpenVINO™: [Install Guide](https://docs.openvino.ai/2024/get-started/install-openvino.html), [System Requirements](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html) +- OpenVINO™: [Install Guide](https://docs.openvino.ai/2025/get-started/install-openvino.html), [System Requirements](https://docs.openvino.ai/2025/about-openvino/release-notes-openvino/system-requirements.html) - ONNX: [Install Guide](https://onnxruntime.ai/docs/install/) - PyTorch: [Install Guide](https://pytorch.org/get-started/locally/#start-locally) - TensorFlow: [Install Guide](https://www.tensorflow.org/install/) diff --git a/docs/ModelZoo.md b/docs/ModelZoo.md index 9dad44aaacc..8555ad42643 100644 --- a/docs/ModelZoo.md +++ b/docs/ModelZoo.md @@ -2,7 +2,7 @@ Ready-to-use **Compressed LLMs** can be found on [OpenVINO Hugging Face page](https://huggingface.co/OpenVINO#models). Each model card includes NNCF parameters that were used to compress the model. -**INT8 Post-Training Quantization** ([PTQ](../README.md#post-training-quantization)) results for public Vision, NLP and GenAI models can be found on [OpenVino Performance Benchmarks page](https://docs.openvino.ai/2024/about-openvino/performance-benchmarks.html). PTQ results for ONNX models are available in the [ONNX](#onnx) section below. +**INT8 Post-Training Quantization** ([PTQ](../README.md#post-training-quantization)) results for public Vision, NLP and GenAI models can be found on [OpenVino Performance Benchmarks page](https://docs.openvino.ai/2025/about-openvino/performance-benchmarks.html). PTQ results for ONNX models are available in the [ONNX](#onnx) section below. **Quantization-Aware Training** ([QAT](../README.md#training-time-compression)) results for PyTorch and TensorFlow public models can be found below. diff --git a/docs/usage/post_training_compression/weights_compression/Usage.md b/docs/usage/post_training_compression/weights_compression/Usage.md index bcf89c9fc80..bd16fe06ef6 100644 --- a/docs/usage/post_training_compression/weights_compression/Usage.md +++ b/docs/usage/post_training_compression/weights_compression/Usage.md @@ -676,9 +676,9 @@ Accuracy/footprint trade-off for `microsoft/Phi-3-mini-4k-instruct`: ### Additional resources -- [LLM Weight Compression](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html) -- [Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html) -- [Inference with Hugging Face and Optimum Intel](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html) +- [LLM Weight Compression](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/weight-compression.html) +- [Large Language Model Inference Guide](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) +- [Inference with Hugging Face and Optimum Intel](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-optimum-intel.html) - [Optimum Intel documentation](https://huggingface.co/docs/optimum/intel/inference) - [Large Language Models Weight Compression Example](https://github.com/openvinotoolkit/nncf/blob/develop/examples/llm_compression/openvino/tiny_llama) - [Tuning Ratio and Group Size Example](https://github.com/openvinotoolkit/nncf/blob/develop/examples/llm_compression/openvino/tiny_llama_find_hyperparams) diff --git a/nncf/experimental/torch/sparsity/movement/MovementSparsity.md b/nncf/experimental/torch/sparsity/movement/MovementSparsity.md index f98a3aa622b..800b466827c 100644 --- a/nncf/experimental/torch/sparsity/movement/MovementSparsity.md +++ b/nncf/experimental/torch/sparsity/movement/MovementSparsity.md @@ -2,7 +2,7 @@ [Movement Pruning (Sanh et al., 2020)](https://arxiv.org/pdf/2005.07683.pdf) is an effective learning-based unstructured sparsification algorithm, especially for Transformer-based models in transfer learning setup. [Lagunas et al., 2021](https://arxiv.org/pdf/2109.04838.pdf) extends the algorithm to sparsify by block grain size, enabling structured sparsity which can achieve device-agnostic inference acceleration. -NNCF implements both unstructured and structured movement sparsification. The implementation is designed with a minimal set of configuration for ease of use. The algorithm can be applied in conjunction with other NNCF algorithms, e.g. quantization-aware training and knowledge distillation. The optimized model can be deployed and accelerated via [OpenVINO](https://docs.openvino.ai/2024/index.html) toolchain. +NNCF implements both unstructured and structured movement sparsification. The implementation is designed with a minimal set of configuration for ease of use. The algorithm can be applied in conjunction with other NNCF algorithms, e.g. quantization-aware training and knowledge distillation. The optimized model can be deployed and accelerated via [OpenVINO](https://docs.openvino.ai/2025/index.html) toolchain. For usage explanation of the algorithm, let's start with an example configuration below which is targeted for BERT models. @@ -37,11 +37,11 @@ This diagram is the sparsity level of BERT-base model over the optimization life 1. **Unstructured sparsification**: In the first stage, model weights are gradually sparsified in the grain size specified by `sparse_structure_by_scopes`. This example will result in _BertAttention layers (Multi-Head Self-Attention)_ being sparsified in 32 by 32 block size, whereas _BertIntermediate, BertOutput layers (Feed-Forward Network)_ will be sparsified in its row or column respectively. The sparsification follows a predefined warmup schedule where users only have to specify the start `warmup_start_epoch` and end `warmup_end_epoch` and the sparsification strength proportional to `importance_regularization_factor`. Users might need some heuristics to find a satisfactory trade-off between sparsity and task performance. For more details on how movement sparsification works, please refer the original papers [1, 2] . -2. **Structured masking and fine-tuning**: At the end of first stage, i.e. `warmup_end_epoch`, the sparsified model cannot be accelerated without tailored HW/SW but some sparse structures can be totally discarded from the model to save compute and memory footprint. NNCF provides mechanism to achieve structured masking by `"enable_structured_masking": true`, where it automatically resolves the structured masking between dependent layers and rewinds the sparsified parameters that does not participate in acceleration for task modeling. In the example above, the sparsity level has dropped after `warmup_end_epoch` due to structured masking and the model will continue to fine-tune thereafter. Currently, the automatic structured masking feature was tested on **_BERT, DistilBERT, RoBERTa, MobileBERT, Wav2Vec2, Swin, ViT, CLIPVisual_** architectures defined by [Hugging Face's transformers](https://huggingface.co/docs/transformers/index). Support for other architectures is not guaranteed. Users can disable this feature by setting `"enable_structured_masking": false`, where the sparse structures at the end of first stage will be frozen and training/fine-tuning will continue on unmasked parameters. Please refer next section to realize model inference acceleration with [OpenVINO](https://docs.openvino.ai/2024/index.html) toolchain. +2. **Structured masking and fine-tuning**: At the end of first stage, i.e. `warmup_end_epoch`, the sparsified model cannot be accelerated without tailored HW/SW but some sparse structures can be totally discarded from the model to save compute and memory footprint. NNCF provides mechanism to achieve structured masking by `"enable_structured_masking": true`, where it automatically resolves the structured masking between dependent layers and rewinds the sparsified parameters that does not participate in acceleration for task modeling. In the example above, the sparsity level has dropped after `warmup_end_epoch` due to structured masking and the model will continue to fine-tune thereafter. Currently, the automatic structured masking feature was tested on **_BERT, DistilBERT, RoBERTa, MobileBERT, Wav2Vec2, Swin, ViT, CLIPVisual_** architectures defined by [Hugging Face's transformers](https://huggingface.co/docs/transformers/index). Support for other architectures is not guaranteed. Users can disable this feature by setting `"enable_structured_masking": false`, where the sparse structures at the end of first stage will be frozen and training/fine-tuning will continue on unmasked parameters. Please refer next section to realize model inference acceleration with [OpenVINO](https://docs.openvino.ai/2025/index.html) toolchain. -## Inference Acceleration via [OpenVINO](https://docs.openvino.ai/2024/index.html) +## Inference Acceleration via [OpenVINO](https://docs.openvino.ai/2025/index.html) -Optimized models are compatible with OpenVINO toolchain. Use `compression_controller.export_model("movement_sparsified_model.onnx")` to export model in onnx format. Sparsified parameters in the onnx are in value of zero. Structured sparse structures can be discarded during ONNX translation to OpenVINO IR using [Model Conversion](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-to-ir.html) with utilizing [pruning transformation](https://docs.openvino.ai/2024/documentation/legacy-features/transition-legacy-conversion-api.html#transform). Corresponding IR is compressed and deployable with [OpenVINO Runtime](https://docs.openvino.ai/2024/openvino-workflow/running-inference.html). To quantify inference performance improvement, both ONNX and IR can be profiled using [Benchmark Tool](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/benchmark-tool.html). +Optimized models are compatible with OpenVINO toolchain. Use `compression_controller.export_model("movement_sparsified_model.onnx")` to export model in onnx format. Sparsified parameters in the onnx are in value of zero. Structured sparse structures can be discarded during ONNX translation to OpenVINO IR using [Model Conversion](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-to-ir.html) with utilizing [pruning transformation](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/compressing-models-during-training/filter-pruning.html). Corresponding IR is compressed and deployable with [OpenVINO Runtime](https://docs.openvino.ai/2025/openvino-workflow/running-inference.html). To quantify inference performance improvement, both ONNX and IR can be profiled using [Benchmark Tool](https://docs.openvino.ai/2025/get-started/learn-openvino/openvino-samples/benchmark-tool.html). ## Getting Started From be61f692ebe755948fd33beb8504f13fe8856995 Mon Sep 17 00:00:00 2001 From: Nikita Malinin Date: Thu, 6 Feb 2025 11:49:24 +0100 Subject: [PATCH 2/2] Bump OV version (#3236) (#3256) (cherry picked from commit f74caa9fbaf639306674719ec5b39a377bfcdab1) --- constraints.txt | 2 +- docs/Installation.md | 3 ++- .../llm_compression/openvino/smollm2_360m_fp8/requirements.txt | 2 +- examples/llm_compression/openvino/tiny_llama/requirements.txt | 2 +- .../openvino/tiny_llama_find_hyperparams/requirements.txt | 2 +- .../openvino/tiny_llama_synthetic_data/requirements.txt | 2 +- .../onnx/mobilenet_v2/requirements.txt | 2 +- .../yolov8_quantize_with_accuracy_control/requirements.txt | 2 +- .../requirements.txt | 2 +- .../openvino/mobilenet_v2/requirements.txt | 2 +- .../openvino/yolov8/requirements.txt | 2 +- .../yolov8_quantize_with_accuracy_control/requirements.txt | 2 +- .../tensorflow/mobilenet_v2/requirements.txt | 2 +- .../torch/mobilenet_v2/requirements.txt | 2 +- .../torch/ssd300_vgg16/requirements.txt | 2 +- .../torch_fx/resnet18/requirements.txt | 2 +- .../tensorflow/mobilenet_v2/requirements.txt | 2 +- .../torch/resnet18/requirements.txt | 2 +- 18 files changed, 19 insertions(+), 18 deletions(-) diff --git a/constraints.txt b/constraints.txt index 19b5946979b..46e9be3f455 100644 --- a/constraints.txt +++ b/constraints.txt @@ -1,5 +1,5 @@ # Openvino -openvino==2024.6.0 +openvino==2025.0.0 # Pytorch torch==2.5.1 diff --git a/docs/Installation.md b/docs/Installation.md index d03d7f76191..7e5998719a1 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -49,7 +49,8 @@ as well as the supported versions of Python: | NNCF | OpenVINO | PyTorch | ONNX | TensorFlow | Python | |-----------|------------|----------|----------|------------|--------| -| `develop` | `2024.6.0` | `2.5.1` | `1.17.0` | `2.15.1` | `3.10` | +| `develop` | `2025.0.0` | `2.5.1` | `1.17.0` | `2.15.1` | `3.10` | +| `2.15.0` | `2025.0.0` | `2.5.1` | `1.17.0` | `2.15.1` | `3.10` | | `2.14.1` | `2024.6.0` | `2.5.1` | `1.17.0` | `2.15.1` | `3.10` | | `2.14.0` | `2024.5.0` | `2.5.1` | `1.17.0` | `2.15.1` | `3.10` | | `2.13.0` | `2024.4.0` | `2.4.0` | `1.16.0` | `2.15.1` | `3.8`* | diff --git a/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt b/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt index ae450477601..1bc45378c2b 100644 --- a/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt +++ b/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt @@ -1,5 +1,5 @@ datasets -openvino==2024.6 +openvino==2025.0 optimum-intel[openvino] transformers onnx==1.17.0 diff --git a/examples/llm_compression/openvino/tiny_llama/requirements.txt b/examples/llm_compression/openvino/tiny_llama/requirements.txt index e5df23bf41f..560cb416a2c 100644 --- a/examples/llm_compression/openvino/tiny_llama/requirements.txt +++ b/examples/llm_compression/openvino/tiny_llama/requirements.txt @@ -1,5 +1,5 @@ transformers datasets==2.14.7 -openvino==2024.6 +openvino==2025.0 optimum-intel[openvino] onnx==1.17.0 diff --git a/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt b/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt index 2c229c69c17..57747a04031 100644 --- a/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt +++ b/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt @@ -1,7 +1,7 @@ datasets whowhatbench @ git+https://github.com/andreyanufr/who_what_benchmark.git numpy>=1.23.5 -openvino==2024.6 +openvino==2025.0 optimum-intel[openvino]>=1.13.0 transformers>=4.35.2 onnx==1.17.0 diff --git a/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt b/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt index 77afd04dfd0..2c5c8c0c4ad 100644 --- a/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt +++ b/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt @@ -1,7 +1,7 @@ torch==2.5.1 datasets==3.0.1 numpy>=1.23.5 -openvino==2024.6 +openvino==2025.0 optimum-intel[openvino]>=1.13.0 transformers>=4.35.2 onnx==1.17.0 diff --git a/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt b/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt index 402c10d49ae..ec8ee3ec3f6 100644 --- a/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt @@ -4,5 +4,5 @@ scikit-learn fastdownload onnx==1.17.0 onnxruntime==1.19.2 -openvino==2024.6 +openvino==2025.0 numpy<2 diff --git a/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt index 380e8499e80..4474931f57c 100644 --- a/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt +++ b/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt @@ -1,4 +1,4 @@ ultralytics==8.3.22 onnx==1.17.0 onnxruntime==1.19.2 -openvino==2024.6 +openvino==2025.0 diff --git a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt index 09591ec74af..27c3b6b5263 100644 --- a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt +++ b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt @@ -1,4 +1,4 @@ anomalib==0.6.0 -openvino==2024.6 +openvino==2025.0 setuptools<=72.1.0 numpy<2 diff --git a/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt b/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt index bd84e2e1e51..a6573310684 100644 --- a/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt @@ -2,4 +2,4 @@ torchvision tqdm scikit-learn fastdownload -openvino==2024.6 +openvino==2025.0 diff --git a/examples/post_training_quantization/openvino/yolov8/requirements.txt b/examples/post_training_quantization/openvino/yolov8/requirements.txt index b9e7f70bcce..cdab8e77375 100644 --- a/examples/post_training_quantization/openvino/yolov8/requirements.txt +++ b/examples/post_training_quantization/openvino/yolov8/requirements.txt @@ -1,3 +1,3 @@ ultralytics==8.3.22 onnx==1.17.0 -openvino==2024.6 +openvino==2025.0 diff --git a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt index b9e7f70bcce..cdab8e77375 100644 --- a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt +++ b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt @@ -1,3 +1,3 @@ ultralytics==8.3.22 onnx==1.17.0 -openvino==2024.6 +openvino==2025.0 diff --git a/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt b/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt index 5777ac892c3..c7b8e786f15 100644 --- a/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt @@ -1,4 +1,4 @@ tensorflow==2.15.1 tensorflow-datasets tqdm -openvino==2024.6 +openvino==2025.0 diff --git a/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt b/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt index a67d4f94e96..f71d6f09f7f 100644 --- a/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt @@ -1,5 +1,5 @@ fastdownload==0.0.7 -openvino==2024.6 +openvino==2025.0 scikit-learn torch==2.5.1 torchvision==0.20.1 diff --git a/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt b/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt index ff7ab22d643..9414e01a83f 100644 --- a/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt +++ b/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt @@ -1,6 +1,6 @@ fastdownload==0.0.7 onnx==1.17.0 -openvino==2024.6 +openvino==2025.0 pycocotools==2.0.7 torch==2.5.1 torchmetrics==1.0.1 diff --git a/examples/post_training_quantization/torch_fx/resnet18/requirements.txt b/examples/post_training_quantization/torch_fx/resnet18/requirements.txt index b46777b6ed6..0caf6f3432f 100644 --- a/examples/post_training_quantization/torch_fx/resnet18/requirements.txt +++ b/examples/post_training_quantization/torch_fx/resnet18/requirements.txt @@ -1,4 +1,4 @@ fastdownload==0.0.7 -openvino==2024.6 +openvino==2025.0 torch==2.5.1 torchvision==0.20.1 diff --git a/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt b/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt index e1e3c89fe46..4d2de66f2a0 100644 --- a/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt +++ b/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt @@ -2,4 +2,4 @@ tensorflow~=2.12.0; python_version < '3.9' tensorflow~=2.15.1; python_version >= '3.9' tensorflow-datasets tqdm -openvino==2024.6 +openvino==2025.0 diff --git a/examples/quantization_aware_training/torch/resnet18/requirements.txt b/examples/quantization_aware_training/torch/resnet18/requirements.txt index 76c9d3cad75..fff0bd5122f 100644 --- a/examples/quantization_aware_training/torch/resnet18/requirements.txt +++ b/examples/quantization_aware_training/torch/resnet18/requirements.txt @@ -1,5 +1,5 @@ fastdownload==0.0.7 -openvino==2024.6 +openvino==2025.0 torch==2.5.1 torchvision==0.20.1 setuptools<=72.1.0