- Post Training Quantization (PTQ) (OpenVINO, PyTorch, TorchFX, ONNX, TensorFlow)
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Each backend support export to the OpenVINO format
- Weights compression (OpenVINO, PyTorch, TorchFX)
- Symmetric 8 bit compression mode
- Symmetric and asymmetric 4 bit compression mode
- NF4 compression mode
- E2M1 weights with E8M0 scales compression mode
- Mixed precision weights compression
- Grouped weights compression
- Quantization Aware Training (QAT) (PyTorch)
- Training of a quantized model after the Post Training Quantization
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Exports to OpenVINO format
Each compression method in this section receives its own hyperparameters that are organized as a dictionary and basically stored in a JSON file that is deserialized when the training starts. Compression methods can be applied separately or together producing sparse, quantized, or both sparse and quantized models. For more information about the configuration, refer to the samples.
- Legacy Quantization Aware Training (QAT) (PyTorch, TensorFlow)
- Symmetric and asymmetric quantization modes
- Signed and unsigned
- Per tensor/per channel
- Exports to OpenVINO-supported FakeQuantize ONNX nodes
- Arbitrary bitwidth
- Mixed-bitwidth quantization
- Automatic bitwidth assignment based on HAWQ
- Automatic quantization parameter selection and activation quantizer setup based on HW config preset
- Automatic bitwidth assignment mode AutoQ, based on HAQ, a Deep Reinforcement Learning algorithm to select best mixed precision given quality metric and HW type.
- Unstructured sparsity (PyTorch, TensorFlow)
- Magnitude sparsity
- Regularization-based (RB) sparsity
- Filter pruning (Structured sparsity) (PyTorch, TensorFlow)