Optimizing TensorFlow models with Neural Network Compression Framework of OpenVINO by 8-bit quantization.
This tutorial demonstrates how to use NNCF 8-bit quantization to optimize the TensorFlow model for inference with OpenVINO Toolkit. For more advanced usage refer to these examples.
To make downloading and training fast, we use a ResNet-18 model with the Imagenette dataset. Imagenette is a subset of 10 easily classified classes from the ImageNet dataset.
The ImageNet dataset can be donwloaded from here
This tutorial consists of the following steps:
- Fine-tuning of FP32 model
- Transform the original FP32 model to INT8
- Use fine-tuning to restore the accuracy
- Export optimized and original models to Frozen Graph and then to OpenVINO
- Measure and compare the performance of the models
conda create -n venv_demo python=3.7 -y
conda activate venv_demo
pip install tensorflow==2.4.2
pip install openvino-dev==2021.4.2
pip install nncf
Original ResNet18 model weights file is available upon request.
mkdir model
mkdir output
Please put the original weights file ResNet-18_fp32.h5 under the directory model.
Please uncompress the imagenette dataset under the directory dataset.
If you have not done so already, please follow the Installation Guide to install all required dependencies.