torch compilation function.
- MAX_BITWIDTH_BACKWARD_COMPATIBLE
- OPSET_VERSION_FOR_ONNX_EXPORT
has_any_qnn_layers(torch_model: Module) → bool
Check if a torch model has QNN layers.
This is useful to check if a model is a QAT model.
Args:
torch_model
(torch.nn.Module): a torch model
Returns:
bool
: whether this torch model contains any QNN layer.
convert_torch_tensor_or_numpy_array_to_numpy_array(
torch_tensor_or_numpy_array: Union[Tensor, ndarray]
) → ndarray
Convert a torch tensor or a numpy array to a numpy array.
Args:
torch_tensor_or_numpy_array
(Tensor): the value that is either a torch tensor or a numpy array.
Returns:
numpy.ndarray
: the value converted to a numpy array.
build_quantized_module(
model: Union[Module, ModelProto],
torch_inputset: Union[Tensor, ndarray, Tuple[Union[Tensor, ndarray], ]],
import_qat: bool = False,
n_bits: Union[int, Dict[str, int]] = 8,
rounding_threshold_bits: Union[NoneType, int, Dict[str, Union[str, int]]] = None,
reduce_sum_copy=False
) → QuantizedModule
Build a quantized module from a Torch or ONNX model.
Take a model in torch or ONNX, turn it to numpy, quantize its inputs / weights / outputs and retrieve the associated quantized module.
Args:
model
(Union[torch.nn.Module, onnx.ModelProto]): The model to quantize, either in torch or in ONNX.torch_inputset
(Dataset): the calibration input-set, can contain either torch tensors or numpy.ndarrayimport_qat
(bool): Flag to signal that the network being imported contains quantizers in in its computation graph and that Concrete ML should not re-quantize itn_bits
: the number of bits for the quantizationrounding_threshold_bits
(Union[None, int, Dict[str, Union[str, int]]]): Defines precision rounding for model accumulators. Accepts None, an int, or a dict. The dict can specify 'method' (fhe.Exactness.EXACT or fhe.Exactness.APPROXIMATE) and 'n_bits' ('auto' or int)reduce_sum_copy
(bool): if the inputs of QuantizedReduceSum should be copied to avoid bit-width propagation
Returns:
QuantizedModule
: The resulting QuantizedModule.
compile_torch_model(
torch_model: Module,
torch_inputset: Union[Tensor, ndarray, Tuple[Union[Tensor, ndarray], ]],
import_qat: bool = False,
configuration: Optional[Configuration] = None,
artifacts: Optional[DebugArtifacts] = None,
show_mlir: bool = False,
n_bits: Union[int, Dict[str, int]] = 8,
rounding_threshold_bits: Union[NoneType, int, Dict[str, Union[str, int]]] = None,
p_error: Optional[float] = None,
global_p_error: Optional[float] = None,
verbose: bool = False,
inputs_encryption_status: Optional[Sequence[str]] = None,
reduce_sum_copy: bool = False,
device: str = 'cpu'
) → QuantizedModule
Compile a torch module into an FHE equivalent.
Take a model in torch, turn it to numpy, quantize its inputs / weights / outputs and finally compile it with Concrete
Args:
torch_model
(torch.nn.Module): the model to quantizetorch_inputset
(Dataset): the calibration input-set, can contain either torch tensors or numpy.ndarray.import_qat
(bool): Set to True to import a network that contains quantizers and was trained using quantization aware trainingconfiguration
(Configuration): Configuration object to use during compilationartifacts
(DebugArtifacts): Artifacts object to fill during compilationshow_mlir
(bool): if set, the MLIR produced by the converter and which is going to be sent to the compiler backend is shown on the screen, e.g., for debugging or demon_bits
(Union[int, Dict[str, int]]): number of bits for quantization, can be a single value or a dictionary with the following keys : - "op_inputs" and "op_weights" (mandatory) - "model_inputs" and "model_outputs" (optional, default to 5 bits). When using a single integer for n_bits, its value is assigned to "op_inputs" and "op_weights" bits. Default is 8 bits.rounding_threshold_bits
(Union[None, int, Dict[str, Union[str, int]]]): Defines precision rounding for model accumulators. Accepts None, an int, or a dict. The dict can specify 'method' (fhe.Exactness.EXACT or fhe.Exactness.APPROXIMATE) and 'n_bits' ('auto' or int)p_error
(Optional[float]): probability of error of a single PBSglobal_p_error
(Optional[float]): probability of error of the full circuit. In FHE simulationglobal_p_error
is set to 0verbose
(bool): whether to show compilation informationinputs_encryption_status
(Optional[Sequence[str]]): encryption status ('clear', 'encrypted') for each input. By default all arguments will be encrypted.reduce_sum_copy
(bool): if the inputs of QuantizedReduceSum should be copied to avoid bit-width propagationdevice
: FHE compilation device, can be either 'cpu' or 'cuda'.
Returns:
QuantizedModule
: The resulting compiled QuantizedModule.
compile_onnx_model(
onnx_model: ModelProto,
torch_inputset: Union[Tensor, ndarray, Tuple[Union[Tensor, ndarray], ]],
import_qat: bool = False,
configuration: Optional[Configuration] = None,
artifacts: Optional[DebugArtifacts] = None,
show_mlir: bool = False,
n_bits: Union[int, Dict[str, int]] = 8,
rounding_threshold_bits: Union[NoneType, int, Dict[str, Union[str, int]]] = None,
p_error: Optional[float] = None,
global_p_error: Optional[float] = None,
verbose: bool = False,
inputs_encryption_status: Optional[Sequence[str]] = None,
reduce_sum_copy: bool = False,
device: str = 'cpu'
) → QuantizedModule
Compile a torch module into an FHE equivalent.
Take a model in torch, turn it to numpy, quantize its inputs / weights / outputs and finally compile it with Concrete-Python
Args:
onnx_model
(onnx.ModelProto): the model to quantizetorch_inputset
(Dataset): the calibration input-set, can contain either torch tensors or numpy.ndarray.import_qat
(bool): Flag to signal that the network being imported contains quantizers in in its computation graph and that Concrete ML should not re-quantize it.configuration
(Configuration): Configuration object to use during compilationartifacts
(DebugArtifacts): Artifacts object to fill during compilationshow_mlir
(bool): if set, the MLIR produced by the converter and which is going to be sent to the compiler backend is shown on the screen, e.g., for debugging or demon_bits
(Union[int, Dict[str, int]]): number of bits for quantization, can be a single value or a dictionary with the following keys : - "op_inputs" and "op_weights" (mandatory) - "model_inputs" and "model_outputs" (optional, default to 5 bits). When using a single integer for n_bits, its value is assigned to "op_inputs" and "op_weights" bits. Default is 8 bits.rounding_threshold_bits
(Union[None, int, Dict[str, Union[str, int]]]): Defines precision rounding for model accumulators. Accepts None, an int, or a dict. The dict can specify 'method' (fhe.Exactness.EXACT or fhe.Exactness.APPROXIMATE) and 'n_bits' ('auto' or int)p_error
(Optional[float]): probability of error of a single PBSglobal_p_error
(Optional[float]): probability of error of the full circuit. In FHE simulationglobal_p_error
is set to 0verbose
(bool): whether to show compilation informationinputs_encryption_status
(Optional[Sequence[str]]): encryption status ('clear', 'encrypted') for each input. By default all arguments will be encrypted.reduce_sum_copy
(bool): if the inputs of QuantizedReduceSum should be copied to avoid bit-width propagationdevice
: FHE compilation device, can be either 'cpu' or 'cuda'.
Returns:
QuantizedModule
: The resulting compiled QuantizedModule.
compile_brevitas_qat_model(
torch_model: Module,
torch_inputset: Union[Tensor, ndarray, Tuple[Union[Tensor, ndarray], ]],
n_bits: Optional[int, Dict[str, int]] = None,
configuration: Optional[Configuration] = None,
artifacts: Optional[DebugArtifacts] = None,
show_mlir: bool = False,
rounding_threshold_bits: Union[NoneType, int, Dict[str, Union[str, int]]] = None,
p_error: Optional[float] = None,
global_p_error: Optional[float] = None,
output_onnx_file: Union[NoneType, Path, str] = None,
verbose: bool = False,
inputs_encryption_status: Optional[Sequence[str]] = None,
reduce_sum_copy: bool = False,
device: str = 'cpu'
) → QuantizedModule
Compile a Brevitas Quantization Aware Training model.
The torch_model parameter is a subclass of torch.nn.Module that uses quantized operations from brevitas.qnn. The model is trained before calling this function. This function compiles the trained model to FHE.
Args:
torch_model
(torch.nn.Module): the model to quantizetorch_inputset
(Dataset): the calibration input-set, can contain either torch tensors or numpy.ndarray.n_bits
(Optional[Union[int, dict]): the number of bits for the quantization. By default, for most models, a value of None should be given, which instructs Concrete ML to use the bit-widths configured using Brevitas quantization options. For some networks, that perform a non-linear operation on an input on an output, if None is given, a default value of 8 bits is used for the input/output quantization. For such models the user can also specify a dictionary with model_inputs/model_outputs keys to override the 8-bit default or a single integer for both values.configuration
(Configuration): Configuration object to use during compilationartifacts
(DebugArtifacts): Artifacts object to fill during compilationshow_mlir
(bool): if set, the MLIR produced by the converter and which is going to be sent to the compiler backend is shown on the screen, e.g., for debugging or demorounding_threshold_bits
(Union[None, int, Dict[str, Union[str, int]]]): Defines precision rounding for model accumulators. Accepts None, an int, or a dict. The dict can specify 'method' (fhe.Exactness.EXACT or fhe.Exactness.APPROXIMATE) and 'n_bits' ('auto' or int)p_error
(Optional[float]): probability of error of a single PBSglobal_p_error
(Optional[float]): probability of error of the full circuit. In FHE simulationglobal_p_error
is set to 0output_onnx_file
(str): temporary file to store ONNX model. If None a temporary file is generatedverbose
(bool): whether to show compilation informationinputs_encryption_status
(Optional[Sequence[str]]): encryption status ('clear', 'encrypted') for each input. By default all arguments will be encrypted.reduce_sum_copy
(bool): if the inputs of QuantizedReduceSum should be copied to avoid bit-width propagationdevice
: FHE compilation device, can be either 'cpu' or 'cuda'.
Returns:
QuantizedModule
: The resulting compiled QuantizedModule.