-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766
Labels
enhancement
New feature or request
Comments
alexsu52
changed the title
Torch FX Quantization
Torch FX/PyTorch 2 Export Quantization
Jun 27, 2024
@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request. |
I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:
|
This was referenced Jul 1, 2024
MaximProshin
changed the title
Torch FX/PyTorch 2 Export Quantization
[TorchFX] Torch FX/PyTorch 2 Export Quantization
Jul 5, 2024
1 task
This was referenced Jul 24, 2024
This was referenced Aug 1, 2024
AlexanderDokuchaev
pushed a commit
that referenced
this issue
Aug 5, 2024
### Changes Added a test in tests/torch/fx/test_models.py to include a test for quantized graph which compares the quantized graph with a reference quantized graph. ### Reason for changes To check if the graph was quantized correctly ### Ticket #2766 ### Tests test_quantized_model() was added in tests/torch/fx/test_models.py
KodiaqQ
pushed a commit
that referenced
this issue
Aug 5, 2024
…izers (#2854) ### Changes Quantizer merge logic updated to check that all output branches are quantized before quantizers merging and propagating up. ### Reason for changes To prevent merging of quantizers in case of ScaledDotProductAttention op, which should have quantizers on [0, 1] input ports and shouldn't have a quantizer on the 3 input port. ### Related tickets 148211 #2766 ### Tests * Common solver test for ScaleDotProductAttention branch merging and quantization initialization * Graph tests for torch/ov backends
This was referenced Aug 6, 2024
AlexanderDokuchaev
pushed a commit
that referenced
this issue
Aug 7, 2024
### Changes Conformance test for resnet18 ### Reason for changes To extend testing scope for the TorchFX backend ### Related tickets #2766 ### Tests post_training_quantization/442 is successfull
AlexanderDokuchaev
pushed a commit
that referenced
this issue
Aug 9, 2024
### Changes Torch FX pre-hook insertion support ### Reason for changes To enable vit_b_16 quantization ### Related tickets #2766 ### Tests test_quantized_models is updated by vit_b_16 and swin_v2_s
AlexanderDokuchaev
pushed a commit
that referenced
this issue
Aug 13, 2024
### Changes Constant linear layers support ### Reason for changes To support swint_v2_s FBC ### Related tickets #2766 ### Tests Build post_training_quantization/444/ is finished successfully Unit test `test_model_transformer.test_model_extraction` is presented
KodiaqQ
pushed a commit
that referenced
this issue
Aug 16, 2024
### Changes TorchFX SmoothQuant backend implementation * module_insertion_transformation_builder is introduced * Transformation requires names for new modules and nodes * vit_b_16 is introduced in the conformance tests ### Reason for changes To improve metrics of quantized models: swin_v2_s and vit_b_16 * To insert SQ multiply nodes to the graph * To make node names human-readable and consistent * To check sq algorithm E2E ### Related tickets #2766 ### Tests * Smooth quant test template is implemented for TorchfX backed * Conformance test: post_training_quantization/446/ is successfull * Test models check SQ multiplies for swin_v2_s and vit_b_16 models
This was referenced Aug 20, 2024
alexsu52
added a commit
that referenced
this issue
Nov 7, 2024
### Changes Constant folding is enabled by default in TorchFX backend ### Reason for changes To align quantizers placement between OV and TorchFX ### Related tickets #2766 ### Tests * test_constant_folding * test_constant_folding_with_constraints * test_models.py references are updated * post_training_quantization/535/ - finished successfully --------- Co-authored-by: Alexander Suslov <[email protected]> Co-authored-by: Aamir Nazir <[email protected]>
alexsu52
pushed a commit
that referenced
this issue
Nov 14, 2024
### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets #2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
daniil-lyakhov
added a commit
to daniil-lyakhov/nncf
that referenced
this issue
Nov 14, 2024
…it#3075) ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets openvinotoolkit#2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
KodiaqQ
pushed a commit
that referenced
this issue
Nov 14, 2024
PR #3075 to the release branch: ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets #2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
alexsu52
pushed a commit
that referenced
this issue
Nov 15, 2024
### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets #2766
This was referenced Nov 22, 2024
alexsu52
pushed a commit
that referenced
this issue
Nov 25, 2024
### Changes * Torch SDPA pattern is updated * As the concat node has his input nodes in format `args=([inp_1, ..., inp_n], dim)`, thus it should be treated differently. Retrieving concat inputs by input port id was supported in each TorchFX transformation ### Reason for changes * To support quantization of ultralytics/yolo11n in TorchFX backend ### Related tickets #2766 157032 ### Tests * `tests/torch/fx/test_model_transformer.py` and `tests/torch/fx/test_compress_weights.py` are updated to check all cases with the concat node. All .`dot` / `.json` were checked manually. * `tests/torch/fx/test_models.py` is updated with `YOLO11N_SDPABlock` synthetic model to check the correctness of SDPA pattern matching
alexsu52
pushed a commit
that referenced
this issue
Nov 26, 2024
### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets #2766 ### Tests post_training_quantization/555/ have finished succesfully
daniil-lyakhov
added a commit
to daniil-lyakhov/nncf
that referenced
this issue
Dec 2, 2024
…notoolkit#3078) ### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets openvinotoolkit#2766 ### Tests post_training_quantization/555/ have finished succesfully
daniil-lyakhov
added a commit
to daniil-lyakhov/nncf
that referenced
this issue
Dec 2, 2024
…notoolkit#3078) ### Changes All `capture_pre_autograd_graph` calls in the conformance test were replaced by `torch.export.export_for_training`. ### Reason for changes To remove deprecated `capture_pre_autograd_graph` from the conformance test. ### Related tickets openvinotoolkit#2766 ### Tests post_training_quantization/555/ have finished succesfully
alexsu52
pushed a commit
that referenced
this issue
Dec 4, 2024
### Changes * Bias fusing is removed from default transformations * `constant_folding` is updated to remove inplace operations without users * `extract_model` is updated to support original model output as a subgraph output ### Reason for changes To make it possible to apply quantization the same way it done by X86Quantizer ### Related tickets #2766 110985 ### Tests * All int8 references are updated and checked manually * `test_constant_folding` and `test_constant_folding_with_constraints` are updated with a constant subgraph which contains an inplace op (`relu_`) * `test_model_extraction_with_original_output` is introduced * conformance test post_training_quantization/557 have finished successfully
alexsu52
pushed a commit
that referenced
this issue
Dec 4, 2024
### Changes Folded constants do not require gradient ### Reason for changes * To unify all model constant/buffers * To make compressed model deepcopy-able ### Related tickets #2766 ### Tests `test_constant_folding` is updated
nikita-savelyevv
pushed a commit
to nikita-savelyevv/nncf
that referenced
this issue
Dec 11, 2024
…it#3075) ### Changes * TorchFX Unit tests are moved from `torch._export.capture_pre_autograd_graph` to `torch.export.export_for_training` ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY * BC types for `fuse_bn_node` are updated * NNCFGraphBuilder is updated to support a batch-norm type with only one output node (instead of three) * Model extractor does not traverse down from constans to prevent redundant nodes in the extracted model when the constant is shared * `shared_constants_unification_transformation` is removed * Tests which require `capture_pre_autograd_graph` are removed ### Reason for changes * To migrate to the lates and recommended export method for TorchFX backend ### Related tickets openvinotoolkit#2766 ### Tests test_shared_constants_unification_not_connected_const post_training_quantization/540/ is finished successfully
nikita-savelyevv
pushed a commit
to nikita-savelyevv/nncf
that referenced
this issue
Dec 11, 2024
### Changes * Main README.md, Usage.md and post training quantization docs are updated with info about the TorchFX ### Reason for changes * To reflect new experimental features of TorchFX in the docs ### Related tickets openvinotoolkit#2766
alexsu52
pushed a commit
that referenced
this issue
Jan 21, 2025
) ### Changes Introduction of `quantize_pt2e` method ### Reason for changes ### Related tickets #2766 ### Tests graph tests: `tests/torch/fx/test_quantizer.py`
alexsu52
pushed a commit
that referenced
this issue
Jan 28, 2025
### Changes * torch.ao `OpenVINOQuantizer` as well as `OpenVINOQuantizerAdapter` are introduced * `quantize_pt2e` function is updated to work with `OpenVINOQuantizer` ### Reason for changes * To enable OpenVINO quantization for torch.ao quantization pipelines (`torch.ao.quantization.prepare_pt2e`, `torch.ao.quantization.convert_pt2e`) and quantize_pt2e API function ### Related tickets #2766 ### Tests tests/torch/fx/test_quantizer.py is updated with use cases: - `OpenVINOQuantizer` + `quantize_pt2e` - `OpenVINOQuantizer` +`torch.ao.quantization.prepare_pt2e` -> `torch.ao.quantization.convert_pt2e`
daniil-lyakhov
added a commit
to daniil-lyakhov/nncf
that referenced
this issue
Jan 28, 2025
### Changes * torch.ao `OpenVINOQuantizer` as well as `OpenVINOQuantizerAdapter` are introduced * `quantize_pt2e` function is updated to work with `OpenVINOQuantizer` ### Reason for changes * To enable OpenVINO quantization for torch.ao quantization pipelines (`torch.ao.quantization.prepare_pt2e`, `torch.ao.quantization.convert_pt2e`) and quantize_pt2e API function ### Related tickets openvinotoolkit#2766 ### Tests tests/torch/fx/test_quantizer.py is updated with use cases: - `OpenVINOQuantizer` + `quantize_pt2e` - `OpenVINOQuantizer` +`torch.ao.quantization.prepare_pt2e` -> `torch.ao.quantization.convert_pt2e`
MaximProshin
pushed a commit
that referenced
this issue
Jan 28, 2025
Follow up of #3203 ### Changes * torch.ao `OpenVINOQuantizer` as well as `OpenVINOQuantizerAdapter` are introduced * `quantize_pt2e` function is updated to work with `OpenVINOQuantizer` ### Reason for changes * To enable OpenVINO quantization for torch.ao quantization pipelines (`torch.ao.quantization.prepare_pt2e`, `torch.ao.quantization.convert_pt2e`) and quantize_pt2e API function ### Related tickets #2766 ### Tests tests/torch/fx/test_quantizer.py is updated with use cases: - `OpenVINOQuantizer` + `quantize_pt2e` - `OpenVINOQuantizer` +`torch.ao.quantization.prepare_pt2e` -> `torch.ao.quantization.convert_pt2e`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
🚀 Feature request
Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:
These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using
torch.compile
. However OpenVINO provide backend fortorch.compile
, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule
) models and users have to useX86InductorQuantizer
to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized byX86InductorQuantizer
and OpenVINO INT8 models quantized byNNCF
show thatNNCF
produces more accurate and efficient INT8 models.Feature request is to support for
torch.fx.GraphModule
models innncf.quantize
to enable the creation of accurate and highly efficient models usingtorch.compile
with the OpenVINO backend.Feature Use Case
Are you going to submit a PR?
The text was updated successfully, but these errors were encountered: