Release v2.5.0 · openvinotoolkit/nncf

Official release of OpenVINO framework support.
- Ported NNCF OpenVINO backend to use the nGraph representation of OpenVINO models.
- Changed dependecies of NNCF OpenVINO backend. It now depends on openvino package and not on the openvino-dev package.
- Added GRU/LSTM quantization support.
- Added quantizer scales unification.
- Added support for models with 3D and 5D Depthwise convolution.
- Added FP16 OpenVINO models support.
Added "overflow_fix" parameter (for quantize(...) & quantize_with_accuracy_control(...) methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in Quantization section.
(OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).
(OpenVINO) Added Quantization with accuracy control algorithm.
(OpenVINO) Added YOLOv8 examples for quantize(...) & quantize_with_accuracy_control(...) methods.
(PyTorch) Added min-max quantization algorithm as experimental.

Fixed ignored_scope attribute behaviour for weights. Now, the weighted layers excludes from optimization scope correctly.
(ONNX) Checking correct ONNX opset version via the nncf.quantize(...). Now, models with opset < 13 are optimized correctly in per-tensor quantization.

Added improvements for statistic collection process (collect weights statistics only once).
(PyTorch, OpenVINO, ONNX) Introduced unified quantizer parameters calculation.

quantize(...) method can generate inaccurate int8 results for models with the DenseNet-like architecture. Use quantize_with_accuracy_control(...) in such case.
quantize(...) method can hang on models with transformer architecture when fast_bias_correction optional parameter is set to False. Don't set it to False or use quantize_with_accuracy_control(...) in such case.
quantize(...) method can generate inaccurate int8 results for models with the MobileNet-like architecture on non-VNNI machines.

Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.
Added nncf.common.utils.patcher.Patcher - this class can be used to patch methods on live PyTorch model objects with wrappers such as nncf.torch.dynamic_graph.context.no_nncf_trace when doing so in the model code is not possible (e.g. if the model comes from an external library package).
Compression controllers of the nncf.api.compression.CompressionAlgorithmController class now have a .strip() method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.

Extension of attributes (transpose/permute/getitem) for pruning node selector.
NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.
Added average pool 3d-like ops to pruning mask.
Added Conv3d for overflow fix.
nncf.set_log_file(...) can now be used to set location of the NNCF log file.
(PyTorch) Added support for pruning of torch.nn.functional.pad operation.
(PyTorch) Added torch.baddbmm as an alias for the matmul metatype for quantization purposes.
(PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.
(PyTorch) Fixed JIT-traceable PyTorch models with internal patching.
(PyTorch) Added __matmul__ magic functions to the list of patched ops (for SwinTransformer by Microsoft).

Provide feedback