Skip to content

Latest commit

 

History

History
47 lines (45 loc) · 5.55 KB

modelserver_params.md

File metadata and controls

47 lines (45 loc) · 5.55 KB

Model server parameters

Parameter Description
image_name model server docker image. The default is the latest public docker image
deployment_parameters.replicas number if model server replicas to be used. In case if enabled autoscaling, it defines the initial number of replicas
deployment_parameters.openshift_service_mesh When the value is true, it adds the annotations enabling the models server deployment for OpenShift Service Mesh
deployment_parameters.extra_envs_secret Secret name including extra environment variables to be applied in the deployed pods oc create secret generic env_secret --from-file envfile.txt
deployment_parameters.extra_envs_configmap Configmap name including extra environment variables to be applied in the deployed pods oc create configmap env_configmap --from-literal=ENVNAME=VALUE
service_parameters.grpc_port gRPC service port; the default value is 8080
service_parameters.rest_port REST API service port; the default value is 8081
service_parameters.service_type service type; the default value is ClusterIP
models_settings.single_model_mode set true if one one model should be deployed; value false indicate that config.json file should be used to configure multiple models
models_settings.config_configmap_name Config map hosting the config.json file
models_settings.config_path Path to the config file in case it was mounted in the container via a persistent volume claim
models_settings.model_name Model name to be used on the client side in the remote calls
models_settings.model_path Path to the model folder in the model repository; for example gs://<bucket_name>/<model_dir>
models_settings.nireq The size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources
models_settings.plugin_config Adds OpenVINO plugin configuration for tuning the performance. Value {\"PERFORMANCE_HINT\":\"LATENCY\"} optimizes the inference latency with a single client scenario
models_settings.batch_size change the model batch size
models_settings.shape shape is optional and takes precedence over batch_size. The shape argument changes the model that is enabled in the model server to fit the parameters. shape accepts three forms of the values: a tuple, such as (-1,3,100-200,224) - The tuple defines the shape to use for all incoming requests for models with a single input. Each dimension can be a static value 3, a range 100-200 or -1 which is undefined value. A dictionary of shapes, such as {"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"} set shape for multiple inputs
models_settings.model_version_policy '{"latest": { "num_versions":1 }}'
models_settings.layout Change layout of the model input or output with image data; NCHW:NHWC changes the layout from NCHW to NHWC
models_settings.target_device Any supported OpenVINO target device like CPU/GPU/HDDL/MULTI/HETERO/AUTO
models_settings.is_stateful set true it the model is stateful
models_settings.idle_sequence_cleanup If set to true, model will be subject to periodic sequence cleaner scans. See idle sequence cleanup
models_settings.low_latency_transformation If set to true, model server will apply low latency transformation on model load
models_settings.max_sequence_number Determines how many sequences can be handled concurrently by a model instance.
server_settings.file_system_poll_wait_seconds Time interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring.
server_settings.log_level One of ERROR/WARNING/INFO/DEBUG
server_settings.grpc_workers number of gRPC servers; default is 1
server_settings.rest_workers number of REST server threads; default is calculated automatically
models_repository.https_proxy proxy to be used to pull cloud storage models
models_repository.http_proxy proxy to be used to pull cloud storage models
models_repository.storage_type one of google storage, s3, azure blob or cluster
models_repository.models_host_path Mounts node local path in container as /models folder
models_repository.models_volume_claim Mounts persistent volume claim in the container as /models; persistent Volume Claim should be create in the same namespace and populated with the model repository content
models_repository.runAsUser account security context
models_repository.runAsGroup group security context
models_repository.aws_secret_access_key S3 storage secret key, use it with S3 storage for models
models_repository.aws_access_key_id S3 storage access key id, use it with S3 storage for models
models_repository.aws_region S3 storage secret key, use it with S3 storage for models
models_repository.s3_compat_api_endpoint S3 compatibility api endpoint, use it with Minio storage for models
models_repository.gcp_creds_secret_name secret resource including GCP credentials, use it with google storage for models; create it via kubectl create secret generic <secret name> --from-file gcp-creds.json
models_repository.azure_storage_connection_string Connection string to the Azure Storage authentication account, use it with Azure storage for models

Check an example of the fully functional ModelServer resource