Parameter | Description |
image_name | model server docker image. The default is the latest public docker image |
deployment_parameters.replicas | number if model server replicas to be used. In case if enabled autoscaling, it defines the initial number of replicas |
deployment_parameters.openshift_service_mesh | When the value is true , it adds the annotations enabling the models server deployment for OpenShift Service Mesh |
deployment_parameters.extra_envs_secret | Secret name including extra environment variables to be applied in the deployed pods oc create secret generic env_secret --from-file envfile.txt |
deployment_parameters.extra_envs_configmap | Configmap name including extra environment variables to be applied in the deployed pods oc create configmap env_configmap --from-literal=ENVNAME=VALUE |
service_parameters.grpc_port | gRPC service port; the default value is 8080 |
service_parameters.rest_port | REST API service port; the default value is 8081 |
service_parameters.service_type | service type; the default value is ClusterIP |
models_settings.single_model_mode | set true if one one model should be deployed; value false indicate that config.json file should be used to configure multiple models |
models_settings.config_configmap_name | Config map hosting the config.json file |
models_settings.config_path | Path to the config file in case it was mounted in the container via a persistent volume claim |
models_settings.model_name | Model name to be used on the client side in the remote calls |
models_settings.model_path | Path to the model folder in the model repository; for example gs://<bucket_name>/<model_dir> |
models_settings.nireq | The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources |
models_settings.plugin_config | Adds OpenVINO plugin configuration for tuning the performance. Value {\"PERFORMANCE_HINT\":\"LATENCY\"} optimizes the inference latency with a single client scenario |
models_settings.batch_size | change the model batch size |
models_settings.shape | shape is optional and takes precedence over batch_size. The shape argument changes the model that is enabled in the model server to fit the parameters. shape accepts three forms of the values: a tuple, such as (-1,3,100-200,224) - The tuple defines the shape to use for all incoming requests for models with a single input. Each dimension can be a static value 3 , a range 100-200 or -1 which is undefined value. A dictionary of shapes, such as {"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"} set shape for multiple inputs |
models_settings.model_version_policy | '{"latest": { "num_versions":1 }}' |
models_settings.layout | Change layout of the model input or output with image data; NCHW:NHWC changes the layout from NCHW to NHWC |
models_settings.target_device | Any supported OpenVINO target device like CPU/GPU/HDDL/MULTI/HETERO/AUTO |
models_settings.is_stateful | set true it the model is stateful |
models_settings.idle_sequence_cleanup | If set to true, model will be subject to periodic sequence cleaner scans. See idle sequence cleanup |
models_settings.low_latency_transformation | If set to true, model server will apply low latency transformation on model load |
models_settings.max_sequence_number | Determines how many sequences can be handled concurrently by a model instance. |
server_settings.file_system_poll_wait_seconds | Time interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring. |
server_settings.log_level | One of ERROR/WARNING/INFO/DEBUG |
server_settings.grpc_workers | number of gRPC servers; default is 1 |
server_settings.rest_workers | number of REST server threads; default is calculated automatically |
models_repository.https_proxy | proxy to be used to pull cloud storage models |
models_repository.http_proxy | proxy to be used to pull cloud storage models |
models_repository.storage_type | one of google storage , s3 , azure blob or cluster |
models_repository.models_host_path | Mounts node local path in container as /models folder |
models_repository.models_volume_claim | Mounts persistent volume claim in the container as /models; persistent Volume Claim should be create in the same namespace and populated with the model repository content |
models_repository.runAsUser | account security context |
models_repository.runAsGroup | group security context |
models_repository.aws_secret_access_key | S3 storage secret key, use it with S3 storage for models |
models_repository.aws_access_key_id | S3 storage access key id, use it with S3 storage for models |
models_repository.aws_region | S3 storage secret key, use it with S3 storage for models |
models_repository.s3_compat_api_endpoint | S3 compatibility api endpoint, use it with Minio storage for models |
models_repository.gcp_creds_secret_name | secret resource including GCP credentials, use it with google storage for models; create it via kubectl create secret generic <secret name> --from-file gcp-creds.json |
models_repository.azure_storage_connection_string | Connection string to the Azure Storage authentication account, use it with Azure storage for models |
Check an example of the fully functional ModelServer resource