Skip to content

Exporting a model trained with mixed precision to ONNX #1890

Answered by grimoire
jtorhoff asked this question in Q&A
Discussion options

You must be logged in to vote

ONNX itself has float16 dtype and some of it's layer do have float16 support such as conv. So in theory, you can create an ONNX model with mixed precision. Visualize your model in the https://netron.app/ to see if it does convert the weight to fp16 or not.
As for fp16, TensorRT can clamp the input/output of the nodes without the help of ONNX. That means you can accelerate an ONNX model even if it is exported as a fp32 mode.
If you want further acceleration, TensorRT can perform explicit int8 quantization on ONNX with QuantizeLinear and DequantizeLinear node.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jtorhoff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants