diff --git a/README.md b/README.md index 32b98b10f..fdfe3131c 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,39 @@ -# Kubernetes AI Toolchain Operator(KAITO) +# Kubernetes AI Toolchain Operator (Kaito) [![Go Report Card](https://goreportcard.com/badge/github.com/Azure/kaito)](https://goreportcard.com/report/github.com/Azure/kaito) ![GitHub go.mod Go version](https://img.shields.io/github/go-mod/go-version/Azure/kaito) -KAITO has been designed to simplify the workflow of launching AI inference services against popular large open sourced AI models, -such as Falcon or Llama, in a Kubernetes cluster. +Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. +The target models are popular large open sourced inference models such as [falcon](https://huggingface.co/tiiuae) and [llama 2](https://github.com/facebookresearch/llama). +Kaito has the following key differentiations compared to most of the mainstream model deployment methodologies built on top of virtual machine infrastructures. +- Manage large model files using container images. A http server is provided to perform inference calls using the model library. +- Avoid tuning deployment parameters to fit GPU hardware by providing preset configurations. +- Auto-provision GPU nodes based on model requirements. +- Host large model images in public Microsoft Container Registry(MCR) if the license allows. + +Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified. + + +## Architecture + +Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a `workspace` custom resource which describes the GPU requirements and the inference specification. Kaito controllers will automate the deployment by reconciling the `workspace` custom resource. +
+ +
+ +The above figure presents the Kaito architecture overview. Its major components consist of: +- **Workspace controller**: It reconciles the `workspace` custom resource, creates `machine` (explained below) custom resources to trigger node auto provisioning, and creates the inference workload (`deployment` or `statefulset`) based on the model preset configurations. +- **Node provisioner controller**: The controller's name is *gpu-provisioner* in [Kaito helm chart](charts/kaito/gpu-provisioner). It uses the `machine` CRD originated from [Karpenter](https://github.com/aws/karpenter-core) to interact with the workspace controller. It integrates with Azure Kubernetes Service(AKS) APIs to add new GPU nodes to the AKS cluster. +Note that the *gpu-provisioner* is not an open sourced component. It can be replaced by other controllers if they support Karpenter-core APIs. + + +--- ## Installation -The following guidence assumes **Azure Kubernetes Service(AKS)** is used to host the Kubernetes cluster . +The following guidance assumes **Azure Kubernetes Service(AKS)** is used to host the Kubernetes cluster . -### Enable Workload Identity and OIDC Issuer features -The `gpu-povisioner` component requires the [workload identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=dotnet) feature to acquire the token to access the AKS managed cluster with proper permissions. +#### Enable Workload Identity and OIDC Issuer features +The *gpu-povisioner* controller requires the [workload identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=dotnet) feature to acquire the access token to the AKS cluster. ```bash export RESOURCE_GROUP="myResourceGroup" @@ -18,8 +41,8 @@ export MY_CLUSTER="myCluster" az aks update -g $RESOURCE_GROUP -n $MY_CLUSTER --enable-oidc-issuer --enable-workload-identity --enable-managed-identity ``` -### Create an identity and assign permissions -The identity `kaitoprovisioner` is created for the `gpu-povisioner` controller. It is assigned Contributor role for the managed cluster resource to allow changing `$MY_CLUSTER` (e.g., provisioning new nodes in it). +#### Create an identity and assign permissions +The identity `kaitoprovisioner` is created for the *gpu-povisioner* controller. It is assigned Contributor role for the managed cluster resource to allow changing `$MY_CLUSTER` (e.g., provisioning new nodes in it). ```bash export SUBSCRIPTION="mySubscription" az identity create --name kaitoprovisioner -g $RESOURCE_GROUP @@ -29,7 +52,7 @@ az role assignment create --assignee $IDENTITY_PRINCIPAL_ID --scope /subscriptio ``` -### Install helm charts +#### Install helm charts Two charts will be installed in `$MY_CLUSTER`: `gpu-provisioner` chart and `workspace` chart. ```bash helm install workspace ./charts/kaito/workspace @@ -49,25 +72,27 @@ helm install gpu-provisioner ./charts/kaito/gpu-provisioner ``` -### Create federated credential -This allows `gpu-provisioner` controller to use `kaitoprovisioner` identity via an access token. +#### Create the federated credential +The federated identity credential between the managed identity `kaitoprovisioner` and the service account used by the *gpu-provisioner* controller is created. ```bash export AKS_OIDC_ISSUER=$(az aks show -n $MY_CLUSTER -g $RESOURCE_GROUP --subscription $SUBSCRIPTION --query "oidcIssuerProfile.issuerUrl" | tr -d '"') az identity federated-credential create --name kaito-federatedcredential --identity-name kaitoprovisioner -g $RESOURCE_GROUP --issuer $AKS_OIDC_ISSUER --subject system:serviceaccount:"gpu-provisioner:gpu-provisioner" --audience api://AzureADTokenExchange --subscription $SUBSCRIPTION ``` -Note that before doing this step, the `gpu-provisioner` controller pod will constantly fail with the following message in the log: +Then the *gpu-provisioner* can access the managed cluster using a trust token with the same permissions of the `kaitoprovisioner` identity. +Note that before finishing this step, the *gpu-provisioner* controller pod will constantly fail with the following message in the log: ``` panic: Configure azure client fails. Please ensure federatedcredential has been created for identity XXXX. ``` The pod will reach running state once the federated credential is created. -### Clean up +#### Clean up ```bash helm uninstall gpu-provisioner helm uninstall workspace ``` +--- ## Quick start After installing Kaito, one can try following commands to start a faclon-7b inference service. @@ -88,14 +113,14 @@ inference: $ kubectl apply -f examples/kaito_workspace_falcon_7b.yaml ``` -The workspace status can be tracked by running the following command. +The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes `True`, the model has been deployed successfully. ``` $ kubectl get workspace workspace-falcon-7b NAME INSTANCE RESOURCEREADY INFERENCEREADY WORKSPACEREADY AGE workspace-falcon-7b Standard_NC12s_v3 True True True 10m ``` -Once the workspace is ready, one can find the inference service's cluster ip and use a temporal `curl` pod to test the service endpoint in cluster. +Next, one can find the inference service's cluster ip and use a temporal `curl` pod to test the service endpoint in the cluster. ``` $ kubectl get svc workspace-falcon-7b NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE @@ -105,7 +130,7 @@ $ kubectl run -it --rm --restart=Never curl --image=curlimages/curl sh ~ $ curl -X POST http:///chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}" ``` - +--- ## Contributing [Read more](docs/contributing/readme.md) diff --git a/docs/img/arch.png b/docs/img/arch.png new file mode 100644 index 000000000..94c399faf Binary files /dev/null and b/docs/img/arch.png differ