Implemented Compression Methods

Post-training Compression

Post Training Quantization (PTQ) (OpenVINO, PyTorch, TorchFX, ONNX, TensorFlow)
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Each backend support export to the OpenVINO format
Weights compression (OpenVINO, PyTorch, TorchFX)
- Symmetric 8 bit compression mode
- Symmetric and asymmetric 4 bit compression mode
- NF4 compression mode
- E2M1 weights with E8M0 scales compression mode
- Mixed precision weights compression
- Grouped weights compression

Training Time Compression

Quantization Aware Training (QAT) (PyTorch)
- Training of a quantized model after the Post Training Quantization
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Exports to OpenVINO format

Other algorithms

Each compression method in this section receives its own hyperparameters that are organized as a dictionary and basically stored in a JSON file that is deserialized when the training starts. Compression methods can be applied separately or together producing sparse, quantized, or both sparse and quantized models. For more information about the configuration, refer to the samples.

Legacy Quantization Aware Training (QAT) (PyTorch, TensorFlow)
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Exports to OpenVINO-supported FakeQuantize ONNX nodes
- Arbitrary bitwidth
- Mixed-bitwidth quantization
- Automatic bitwidth assignment based on HAWQ
- Automatic quantization parameter selection and activation quantizer setup based on HW config preset
- Automatic bitwidth assignment mode AutoQ, based on HAQ, a Deep Reinforcement Learning algorithm to select best mixed precision given quality metric and HW type.
Unstructured sparsity (PyTorch, TensorFlow)
- Magnitude sparsity
- Regularization-based (RB) sparsity
Filter pruning (Structured sparsity) (PyTorch, TensorFlow)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithms.md

Algorithms.md

Implemented Compression Methods

Post-training Compression

Training Time Compression

Other algorithms

Files

Algorithms.md

Latest commit

History

Algorithms.md

File metadata and controls

Implemented Compression Methods

Post-training Compression

Training Time Compression

Other algorithms