Skip to content

Server infrastructure for GPU inference-as-a-service in large scientific experiments

License

Notifications You must be signed in to change notification settings

fastmachinelearning/SuperSONIC

Repository files navigation

Version DOI Artifact Hub Downloads License

SuperSONIC

The SuperSONIC project implements server infrastructure for inference-as-a-service applications in large high energy physics (HEP) and multi-messenger astrophysics (MMA) experiments. The server infrastructure is designed for deployment at Kubernetes clusters equipped with GPUs.

The main components of SuperSONIC are:

  • Nvidia Triton inference servers
  • Dynamic muti-purpose Envoy Proxy:
    • Load balancing
    • Rate limiting
    • GPU saturation prevention
    • Token-based authentication
  • Load-based autoscaling via KEDA
  • Prometheus instance (deploy custom or connect to existing)
  • Pre-configured Grafana dashboard

Installation

helm repo add fastml https://fastmachinelearning.org/SuperSONIC
helm repo update
helm install <release-name> fastml/supersonic --values <your-values.yaml> -n <namespace>

To construct the values.yaml file for your application, follow Configuration guide.

The full list of configuration parameters is available in the Configuration reference.

Server diagram

diagram

Grafana dashboard

grafana

Status of deployment

CMS ATLAS IceCube
Purdue Geddes - -
Purdue Anvil - -
NRP Nautilus