This document contains the steps for installing and configuring Red Hat OpenShift AI (RHOAI) on your existing OpenShift cluster.
Prior to deploying OpenShift AI, it is recommended you review the Supported Configurations documentation.
Ensure that you have cluster-admin access to an OpenShift cluster, since we will be installing several operators and configuring various components on the cluster.
The cluster must also have a functional storage provisioner available with a default StorageClass.
For GPU deployments, this repo is designed specifically to work with AWS to provision additional GPU nodes, but this can still act as an example to deploy GPU resources in any cloud environment or a self-hosted cluster with some minor modifications.
NOTE: Red Hat employees can request a demo cluster using demo.redhat.com to provision OpenShift AI. For more information see the Red Hat Demo Environment documentation.
The bootstrap script relies on the following command line tools. If they're not already available on your system path, the bootstrap script will attempt to download them from the internet, and will place then in a .\tmp
folder location where the bootstrap script was run:
-
oc - the OpenShift command-line interface (CLI) that allows for creation of applications, and can manage OpenShift Container Platform projects from a terminal.
-
kustomize - a Kubernetes configuration transformation tool that enables you to customize un-templated YAML files, leaving the original files untouched.
-
kubeseal - uses asymmetric crypto to encrypt secrets that only the controller can decrypt. These encrypted secrets are encoded in a SealedSecret resource, which you can see as a recipe for creating a secret.
-
openshift-install (optional) - tooling that could be used for monitoring the cluster installation progress.
Before running the bootstrap script, ensure that you have login access to your OpenShift cluster.
Make sure you are logged into your cluster using the oc login ...
command. You can obtain a login token if required by utilizing the "Copy Login Command" found under your user profile in the OpenShift Web Console.
The scripts require a user with sufficient permissions for installing and configuring operators, typically the kubeadmin
user account on a Red Hat Demo System hosted cluster.
Clone this git repository to a directory location on your local workstation, or to a Bastion server hosted within the OpenShift cluster subnet.
Execute the bootstrap script to begin the installation process:
./scripts/bootstrap.sh
When prompted to select a bootstrap folder, choose the overlay that matches your cluster version, for example: bootstrap/overlays/rhoai-eus-2.8/
.
The bootstrap.sh
script will now install the OpenShift GitOps Operator, create an ArgoCD instance once the operator is deployed in the openshift-gitops
namespace, then bootstrap a set of ArgoCD applications to configure the cluster.
Once the script completes, verify that you can access the ArgoCD UI using the URL output by the last line of the script execution. This URL should present an ArgoCD login page, showing that it was successfully deployed.
TODO: Add in details for the ArgoCD application menu tile within the OCP web console.
Alternatively you can also obtain the ArgoCD login URL from the ArgoCD route:
oc get routes openshift-gitops-server -n openshift-gitops
Use the OpenShift Login option and sign in with your OpenShift credentials.
The cluster may take 10-15 minutes to finish installing and updating.
Argo creates the following group in OpenShift to grant access and control inside of ArgoCD:
- gitopsadmins
To add a user to the admin group run:
oc adm groups add-users argocdadmins $(oc whoami)
To add a user to the user group run:
oc adm groups add-users argocdusers $(oc whoami)
Once the user has been added to the group logout of Argo and log back in to apply the updated permissions. Validate that you have the correct permissions by going to User Info
menu inside of Argo to check the user permissions.
To log into ArgoCD using the argocd
cli tool run the following command:
argocd login --sso <argocd-route> --grpc-web
ArgoCD Symptoms:
Argo Applications and the child subscription object for operator installs show Progressing
for a very long time.
Explanation:
Argo utilizes a Health Check
to validate if an object has been successfully applied and updated, failed, or is progressing by the cluster. The health check for the Subscription
object looks at the Condition
field in the Subscription
which is updated by the OLM
. Once the Subscription
is applied to the cluster, OLM
creates several other objects in order to install the Operator. Once the Operator has been installed OLM
will report the status back to the Subscription
object. This reconciliation process may take several minutes even after the Operator has successfully installed.
Resolution/Troubleshooting:
- Validate that the Operator has successfully installed via the
Installed Operators
section of the OpenShift Web Console. - If the Operator has not installed, additional troubleshooting is required.
- If the Operator has successfully installed, feel free to ignore the
Progressing
state and proceed.OLM
should reconcile the status after several minutes and Argo will update the state toHealthy
.