From ad728b71ebe80d2d097e64b21b43064fa6ccddfd Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Thu, 12 Dec 2024 16:51:02 +0000
Subject: [PATCH 01/12] Add files via upload

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/DeployClusterAPI.md | 798 +++++++++++++++++++++++++++++
 1 file changed, 798 insertions(+)
 create mode 100644 notes/millingw/DeployClusterAPI.md

diff --git a/notes/millingw/DeployClusterAPI.md b/notes/millingw/DeployClusterAPI.md
new file mode 100644
index 00000000..c696795b
--- /dev/null
+++ b/notes/millingw/DeployClusterAPI.md
@@ -0,0 +1,798 @@
+# Deploy Kubernetes Cluster on Arcus with ClusterCtl
+
+Based on Amy's notes https://git.ecdf.ed.ac.uk/akrause/openstack-bits-and-pieces/-/blob/main/ClusterAPI/CreateCluster.md
+
+Manila deployment based on Paul Browne's notes https://gitlab.developers.cam.ac.uk/pfb29/manila-csi-kubespray
+
+Used VM "gaia_dataset_one" in somerville gaia_jade project as command and control VM.
+
+Management cluster created in Somerville gaia_jade project using CAPI Magnum command line client, although management cluster could in theory be anywhere with vpn access.
+
+Prerequisites:
+
+Existing kubernetes cluster (management cluster): used existing cluster "malcolm_k8s" on somerville, created using Magnum python client. 
+However, process for creating initial cluster should not matter here. 
+Access to target OpenStack instance where new cluster will be generated. 
+A source recent ubuntu image must already be present in the target OpenStack project. 
+These notes assume a useable project-level router has already been provisioned in the target OpenStack project.
+
+Required software:
+On command / control machine, need to install python, ansible, kubectl, clusterctl, packer (and dependencies).
+Need ansible / packer to build images on target OpenStack instance
+Need clusterctl for cluster template generation / deployment
+Need kubeconfig for management cluster, access credentials for target openstack cluster. 
+
+On gaia_dataset_one VM (on Somerville):
+
+Install kubectl:
+
+```
+curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
+sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
+```
+
+Install dependencies
+```
+pip install python-dev
+pip install python-openstackclient
+pip install python-magnumclient
+pip install ansible
+sudo dnf install make
+sudo dnf install git
+sudo dnf install wget
+sudo dnf install yq
+```
+# Create and export boostrap cluster details so that we can access it with kubectl (assuming clouds.yaml etc already points to bootstrap OpenStack instance)
+```
+openstack coe cluster config --dir /home/rocky/openstack/k8sdir --force --output-certs malcolm_k8s --os-cloud somerville-jade
+export KUBECONFIG=/home/rocky/openstack/k8sdir/config
+KUBECONFIG now points at our (yet-to-be-initialised) management cluster
+```
+# Install clusterctl:
+
+```
+curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.8.1/clusterctl-linux-amd64 -o clusterctl
+sudo install -o root -g root -m 0755 clusterctl /usr/local/bin/clusterctl
+```
+
+Initialise the management cluster for deploying k8s into OpenStack clouds. 
+This turns our starting magnum-created kubernetes cluster into a ClusterAPI management cluster.
+
+```
+clusterctl init --infrastructure openstack
+```
+
+Our cluster on Somerville is now our management cluster.
+
+# Build CAPI image in target OpenStack environment:
+
+Next, we need to build a control image in our target OpenStack environment
+
+Install Packer on command/control VM:
+
+```
+curl https://releases.hashicorp.com/packer/1.11.2/packer_1.11.2_linux_amd64.zip --output packer_1.11.2_linux_amd64.zip
+unzip packer_1.11.2_linux_amd64.zip
+cd packer
+sudo mv packer /usr/local/bin/packer
+```
+
+Create reqs-build.pkr.hcl 
+
+```
+packer {
+  required_plugins {
+    openstack = {
+      version = ">= 1.1.2"
+      source  = "github.com/hashicorp/openstack"
+    }
+  }
+}
+packer {
+  required_plugins {
+    ansible = {
+      version = ">= 1.1.1"
+      source  = "github.com/hashicorp/ansible"
+    }
+  }
+}
+
+packer init reqs-build.pkr.hcl
+```
+
+create packer_var_file.json, edited for arcus red project
+
+Note that I had to add packer_build_ingest security group to arcus project to allow ssh access for packer to build image
+"networks" is existing router in OpenStack project, did not have to create this
+CUDN-Internet is existing floating ip pool name in gaia red project
+Had to work out flavor and image name from looking at options in the arcus gaia red OpenStack project and doing some trial VM creations to get good combinations
+source_image has to be the name of an existing Ubuntu image in the target OpenStack project
+image_name is the name of the CAPI magnum image that will be built in the target OpenStack project (ie a new image will be built with this name)
+
+```
+{
+  "source_image": "Ubuntu-Jammy-22.04-20240514",
+  "network_discovery_cidrs": "10.1.0.0/24",
+  "networks": "77c534e1-1de2-400b-a315-9d1c9768c99f",
+  "flavor": "gaia.vm.cclake.26vcpu",
+  "floating_ip_network": "CUDN-Internet",
+  "image_name": "Ubuntu-Jammy-22.04-20240514-kube-1.30.2",
+  "image_visibility": "private",
+  "image_disk_format": "raw",
+  "volume_type": "",
+  "ssh_username": "ubuntu",
+  "kubernetes_deb_version": "1.30.2-1.1",
+  "kubernetes_semver": "v1.30.2",
+  "kubernetes_series": "v1.30",
+  "security_groups": "packer_build_ingest"
+}
+```
+
+build the CAPI image in the target OpenStack project:
+
+```
+cd image-builder/images/capi
+PACKER_VAR_FILES=/path/to/packer_var_file.json make build-openstack-ubuntu-2204
+take some time to run, generates new image Ubuntu-Jammy-22.04-20240514-kube-1.30.2 in the target OpenStack project
+Check in the OpenStack project that the image built ok (either via the openstack client, or via the Horizon GUI for the target OpenStack project
+```
+
+# Create new Kubernetes cluster for actual use
+
+The following assumes the management cluster is up and running.
+
+## Create application credentials
+
+Create application credentials in Openstack for the target project (here, iris-gaia-red on Arcus) where the Kubernetes cluster will be created and store in `arcus-red.yaml`.
+
+```
+arcus-red.yaml
+clouds:
+
+
+  iris-gaia-red:
+    auth:
+      auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000
+      application_credential_id: "*********"
+      application_credential_secret: "******"
+    region_name: "RegionOne"
+    interface: "public"
+    identity_api_version: 3
+    auth_type: "v3applicationcredential"
+```
+
+## Set up environment
+
+Get the OpenStack API server certificates by browsing to the horizon interface, click on the padlock symbol, view certificates, download certificate chain
+If necessary, create a new keypair in the OpenStack project that will used to access OpenStack during the cluster creation
+Notes assume server certificates saved to arcus-openstack-hpc-cam-ac-uk.pem
+
+Create environment variable script for configuring clusterctl deployment.
+Note that a value must be supplied for OPENSTACK_DNS_NAMESERVERS must be supplied for the config file generation; however, it may be necessary to edit or delete this from the generated config file (see below).
+(We've seen that on Arcus the value is ignored, but on BSC it is used directly)
+
+```    
+capi-arcus-red-vars.sh:
+
+#! /bin/bash
+
+b64encode(){
+  # Check if wrap is supported. Otherwise, break is supported.
+  if echo | base64 --wrap=0 &> /dev/null; then
+    base64 --wrap=0 $1
+  else
+    base64 --break=0 $1
+  fi
+}
+
+export OPENSTACK_CLOUD=iris-gaia-red
+export OPENSTACK_CLOUD_YAML_B64=$( cat arcus-red.yaml | b64encode )
+export OPENSTACK_CLOUD_CACERT_B64=$( cat arcus-openstack-hpc-cam-ac-uk.pem | b64encode )
+export OPENSTACK_FAILURE_DOMAIN=nova
+export OPENSTACK_EXTERNAL_NETWORK_ID=57add367-d205-4030-a929-d75617a7c63e
+export OPENSTACK_CONTROL_PLANE_MACHINE_FLAVOR=vm.v1.small
+export OPENSTACK_NODE_MACHINE_FLAVOR=gaia.vm.cclake.26vcpu
+export OPENSTACK_IMAGE_NAME=Ubuntu-Jammy-22.04-20240514-kube-1.30.2
+export OPENSTACK_SSH_KEY_NAME=iris-malcolm-kube-test-keypair
+export OPENSTACK_DNS_NAMESERVERS=8.8.8.8
+
+export KUBERNETES_VERSION=1.30.2
+
+# optional
+export CLUSTER_NAME=iris-gaia-red
+export CONTROL_PLANE_MACHINE_COUNT=3
+export WORKER_MACHINE_COUNT=4
+```
+
+Source the above file to populate the environment variables:
+```
+source capi-arcus-red-vars.sh
+```
+
+To interact with the management cluster, ensure that you are using the correct kubeconfig:
+```
+export KUBECONFIG=/home/rocky/openstack/k8sdir/config
+```
+
+## Create ClusterAPI config
+
+# generate a template file for the new cluster using the environment variables we set
+# capi-red.yaml will be an openstack-specific, project specific template file for building a new k8s cluster
+# this does not actually create a cluster, just a new template for building a cluster
+
+clusterctl generate cluster iris-gaia-red > capi-red.yaml
+
+Note that we can't check the generated yaml file into public github, as it contains (base64-encoded) access credentials for OpenStack
+
+The DNS configuration isn't required although the generate script insists that the environment variable is set. 
+You can remove the dns server reference from the config yaml ("dnsNameservers", see below), if not required. (See above note about BSC)
+
+Specify the loadbalancer provider `ovn`in capi-red.yaml:
+
+```
+kind: OpenStackCluster
+metadata:
+  name: iris-gaia-red
+  namespace: default
+spec:
+  apiServerLoadBalancer:
+    enabled: true
+    provider: ovn
+  ...
+```
+
+By default ClusterAPI will try to create a new private network for the kubernetes cluster.  
+We don't always want this. For example, if the network needs to talk to other services that we haven't configured in the template (such as ceph), we may want to use an existing network.  
+In the generated template, a section "managedSubnets" will appear under "OpenStackCluster". Remove the definition of cluster.managedSubnets and instead use cluster.network to specify an existing network. For example: 
+
+```
+kind: OpenStackCluster
+metadata:
+  name: iris-gaia-red
+  namespace: default
+spec:
+  ...
+  network:
+    filter:
+      name: kubernetes-bootstrap-network
+```
+
+```
+managedSubnets:
+  - cidr: 10.6.0.0/24
+    dnsNameservers:
+    - 84.88.52.35
+```
+
+
+If we are building a new network, the value we specified for the dns name server is injected via the value for dnsNameservers.
+The behaviour here appears to be system-dependent.
+On Arcus, the value we set appears to be ignored
+On BSC, the value, if supplied, is used directly and must be correct. However, if dnsNameservers is deleted from the config file, the correct dns name server is used by default.
+
+Probably a good idea to have fairly large root volumes on our nodes; kubernetes seems to want to fill these fast.  
+Set rootVolume in our templates in the following places:
+
+```
+apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+kind: OpenStackMachineTemplate
+metadata:
+  name: iris-gaia-red-ceph-control-plane
+  namespace: default
+spec:
+  template:
+    spec:
+      flavor: gaia.vm.cclake.4vcpu
+      image:
+        filter:
+          name: Ubuntu-Jammy-22.04-20240514-kube-1.30.2
+      sshKeyName: iris-malcolm-kube-test-keypair
+      rootVolume:
+        sizeGiB: 100
+---
+apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+kind: OpenStackMachineTemplate
+metadata:
+  name: iris-gaia-red-ceph-md-0
+  namespace: default
+spec:
+  template:
+    spec:
+      flavor: gaia.vm.cclake.26vcpu
+      image:
+        filter:
+          name: Ubuntu-Jammy-22.04-20240514-kube-1.30.2
+      sshKeyName: iris-malcolm-kube-test-keypair
+      rootVolume:
+        sizeGiB: 200
+```
+
+## Create cluster
+Use the management cluster to actually build the new cluster, in our target environment, using the image that we prebuilt earlier in the target project.
+
+```
+kubectl apply -f capi-red.yaml 
+```
+
+## Check progress
+
+```
+export CLUSTER_NAME=iris-gaia-red
+clusterctl describe cluster ${CLUSTER_NAME}
+```
+
+Once the first machines in the control plane have been created:
+
+Download kubeconfig:
+
+```
+clusterctl get kubeconfig ${CLUSTER_NAME} > ${CLUSTER_NAME}.kubeconfig
+```
+
+## Complete setup
+
+The cluster will not complete until the network configuration is created.
+
+Install Calico CNI
+```
+curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml -O
+kubectl --kubeconfig=${CLUSTER_NAME}.kubeconfig apply -f calico.yaml 
+```
+
+Get network id of the private network of the cluster. The name starts with `k8s-clusterapi-`.
+Get this from the Horizon GUI, or from the openstack client
+(If we specified an existing network, get its ID instead)
+Note that if we use an existing network, the configuration file only needs to be edited once, as the network ID will be fixed unless the network is deleted / recreated
+
+Create the Openstack cloud controller configuration `appcred-iris-gaia-red.conf`, add the application credentials and the private network id.
+This file will be used to create a kubernetes secret, which will then be used by the system setup 
+On Arcus, we just use the default load balancer, amphora.
+
+```
+[Global]
+auth-url=https://arcus.openstack.hpc.cam.ac.uk:5000
+region="RegionOne"
+application-credential-id="****"
+application-credential-secret="****"
+
+[LoadBalancer]
+use-octavia=true
+floating-network-id=d5560abe-c5d5-4653-a2f7-59636448f8fe
+network-id=34de53cc-5b49-489b-9d02-93a31ab7812f
+```
+
+Finish network setup and install the Openstack cloud controller to the cluster.
+
+```
+kubectl --kubeconfig=${CLUSTER_NAME}.kubeconfig create secret -n kube-system generic cloud-config --from-file=cloud.conf=appcred-iris-gaia-red.conf
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/cloud-controller-manager-roles.yaml
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/cloud-controller-manager-role-bindings.yaml
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml                                                                       
+```
+
+Now the cluster setup completes.
+Watch progress
+```
+clusterctl describe cluster ${CLUSTER_NAME}
+```
+
+The cluster initialises with no available storage classes, therefore applications cannot immediately be deployed.
+
+# Install cinder driver
+Install the cinder helm chart
+
+
+Edit cinder-values.yaml to match our deployed cluster. We point it at the secret we already created during the calico installation
+
+```
+secret:
+  enabled: true
+  name: cloud-config
+```
+
+# now deploy into our cluster
+helm install --namespace=kube-system -f cinder-values.yaml --kubeconfig=./${CLUSTER_NAME}.kubeconfig cinder-csi cpo/openstack-cinder-csi
+
+# verify the storage classes were created
+````
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get storageclass
+NAME                             PROVISIONER                       RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
+csi-cinder-sc-delete             cinder.csi.openstack.org          Delete          Immediate           true                   11d
+csi-cinder-sc-retain             cinder.csi.openstack.org          Retain          Immediate           true                   11d
+````
+
+
+# Network configuration
+If we specified an already-existing network in our template, we assume that the network has already had all the necessary configuration applied.  
+If we didn't specify a network, we need to do some work in the Horizon GUI to connect our generated network to the CEPHFS network.
+Our generated network will have a name k8s-clusterapi-cluster-default-<$CLUSTER_NAME> 
+
+In Horizon:
+Cephfs router -> Add New Interface -> select k8s-clusterapi-cluster-default-iris-gaia-red, add unused IP address e.g. 10.6.0.10
+Networks -> select k8s-clusterapi-cluster-default-iris-gaia-red-> Edit Subnet -> Subnet Details.  Added host route 10.4.200.0/24,10.6.0.10
+Add a new bastion host VM on k8s-clusterapi-cluster-default-iris-gaia-red network, add new floating ip address to permit ssh access
+Log into bastion host to access kubernetes worker nodes 
+On each node, as root run sudo ip route add 10.4.200.0/24 via 10.6.0.10
+(We need to manually apply the routing on each node as the routing is normally only applied on VM creation)
+
+Note: it should be possible to automate this through the ClusterAPI template, but still work in progress for now ...
+
+# mount data shares 
+At this point our cluster is ready to use. However, we need to be able to access the GAIA DR3 (and potentially other) data from our services.  
+On the arcus deployment, data is held in a separate project ("iris-gaia-data") within the same physical hardware.  
+In the Horizon GUI, select iris-gaia-data in the project list, then navigate to "shares".  
+Identify the required data share, and note the share path and the associated cephx access rule and key.
+In Horizon, if one doesn't already exist, create a bastion VM on the same network as the kubernetes cluster, and assign a public floating ip address to allow ssh access.
+Log into the bastion VM, and log into each of the worker nodes.
+Note that ceph is very fussy about consistent naming throughout. The name of the keyring file must be consistent with the name of the access rule ("grants access to") itself.
+Do the following on each worker node, for each data share that we want to mount (access via bastion host).
+ceph.conf file shown here for ceph on Arcus. Will be different for other systems.
+
+
+```
+# apt update; apt dist-upgrade -y;  apt-get install ceph-common -y
+# vim /etc/ceph/ceph.conf
+# cat /etc/ceph/ceph.conf
+[global]
+fsid = a900cf30-f8a3-42bf-98d6-af7ce92f1a1a
+mon_host = [v2:10.4.200.13:3300/0,v1:10.4.200.13:6789/0] [v2:10.4.200.9:3300/0,v1:10.4.200.9:6789/0] [v2:10.4.200.17:3300/0,v1:10.4.200.17:6789/0] [v2:10.4.200.26:3300/0,v1:10.4.200.26:6789/0] [v2:10.4.200.25:3300/0,v1:10.4.200.25:6789/0]
+
+
+# Provision the Manila-generated CephX key
+root@pfb29-test:~# vim ceph.client.dr3_data_share.keyring
+root@pfb29-test:~# chmod 0600 ceph.client.dr3_data_share.keyring 
+root@pfb29-test:~# cat ceph.client.dr3_data_share.keyring 
+[client.dr3_data_share]
+	key = $REDACTED
+
+
+# Provision the Manila-generated export path to an env-var, make client mountpoint directory
+# here, EXPORT_PATH is the data share path shown in Horizon for the share
+root@pfb29-test:~# export EXPORT_PATH="10.4.200.9:6789,10.4.200.13:6789,10.4.200.17:6789,10.4.200.25:6789,10.4.200.26:6789:/volumes/_nogroup/fa5309a4-1b69-4713-b298-c8d7a479f86f/d53177c6-c45c-4583-9947-d50ab931445c"
+root@pfb29-test:~# mkdir -p /mnt/dr3_data_share
+
+
+# Mount and stat the CephFS share
+root@pfb29-test:~# mount -t ceph $EXPORT_PATH /mnt/dr3_data_share -o name=dr3_data_share
+root@pfb29-test:~# df -h -t ceph
+Filesystem                                                                                                                                                                       Size  Used Avail Use% Mounted on
+10.4.200.9:6789,10.4.200.13:6789,10.4.200.17:6789,10.4.200.25:6789,10.4.200.26:6789:/volumes/_nogroup/fa5309a4-1b69-4713-b298-c8d7a479f86f/d53177c6-c45c-4583-9947-d50ab931445c   10G     0   10G   0% /mnt/cephfs
+```
+
+Note to self - write a script to automate the above!
+
+Now that all our workers have the data share mounted, we can access it via a hostPath mount from our pods, eg
+
+```
+spec:
+  volumes:
+    - name: mount-this
+      hostPath: 
+        path: /mnt/dr3_data_share
+        type: Directory
+  containers:
+  - volumeMounts:
+    - mountPath: /mnt/dr3_data_share
+      name: mount-this
+      readOnly: true
+```
+
+The (read-only) DR3 data should now be accessible in the pod at /mnt/dr3_data_share
+
+## rescale cluster
+
+The management cluster is used to view active workers and rescale a running worker cluster, via the machinedeployments class.
+e.g.
+
+```
+$ kubectl get machinedeployment
+NAME                      CLUSTER              REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE    VERSION
+bsc-gaia-md-0             bsc-gaia             3          3       3         0             Running   25h    v1.30.2
+iris-gaia-red-ceph-md-0   iris-gaia-red-ceph   4          4       4         0             Running   22d    v1.30.2
+iris-gaia-red-demo-md-0   iris-gaia-red-demo   7          7       7         0             Running   6d2h   v1.30.2
+
+$ kubectl scale machinedeployment iris-gaia-red-demo-md-0 --replicas=9
+
+```
+
+Note that with our current deployment, new VMs will not automatically get the ceph mounts. This will require manual intervention to perform the ceph configuration
+
+# Deleting a cluster
+
+Before deleting a cluster, note that CAPI struggles to delete resources that were created within the cluster, such as services, load balancers etc. 
+Applications should be deleted in reverse order of creation before trying to delete the cluster, especially those managing load balancers and floating ip addresses. 
+This may be useful in making deletions cleaner, haven't tried it yet ... https://github.com/azimuth-cloud/cluster-api-janitor-openstack
+
+To delete a CAPI-deployed cluster:
+
+```
+kubectl delete cluster ${CLUSTER_NAME}
+```
+
+Note we don't specify --kubeconfig here, as we are using the management cluster (ie pointed to by ${KUBECONFIG}) to control the cluster teardown
+
+## Manual deletion
+
+Sometimes things don't go smoothly during deployment, particularly when getting up and running at a new site.
+The management cluster can get confused about the state of the remote cluster. 
+If this happens, easiest way to clean up is to manually delete all the created resources in the target environment, then purge references from the management cluster.
+The following classes need to be purged for the failed cluster, in the following order: OpenStackMachines, OpenStackMachineTemplates, OpenStackClusterTemplate
+
+e.g.
+
+```
+$ kubectl get openstackmachines
+NAME                                     CLUSTER              INSTANCESTATE   READY   PROVIDERID                                          MACHINE                                  AGE
+bsc-gaia-control-plane-r94xt             bsc-gaia             ACTIVE          true    openstack:///25a0e44a-f037-4418-a515-cb2da0e4f3ff   bsc-gaia-control-plane-r94xt             25h
+bsc-gaia-md-0-xqdtp-52fm7                bsc-gaia             ACTIVE          true    openstack:///dc4a2f10-6277-41e5-a6f6-10ef6278df97   bsc-gaia-md-0-xqdtp-52fm725h
+
+$kubectl delete openstackmachine bsc-gaia-md-0-xqdtp-52fm7
+```
+
+Once all resources have been deleted from the management cluster, the cluster itself can be deleted.
+To force deletion, it may be necessary to delete the cluster finaliser by editing the clustertemplate object
+
+```
+$ kubectl get openstackclusters
+NAME                 CLUSTER              READY   NETWORK                                BASTION IP   AGE
+bsc-gaia             bsc-gaia             true    b32e99b0-e3f8-4318-b0fb-9fa1ea3d4bf9                25h
+
+$ kubectl edit openstackcluster bsc-gaia (opens config in vim)
+replace value for finalisers with [] and save out
+
+# Management cluster failure / deletion
+
+If we lose the management cluster for any reason, its not the end of the world. 
+The deployed clusters will still function independently, assuming we have their KUBECONFIG files. 
+However, we should do everything to avoid this happening ...
+
+
+## Ceph and Manila CSI configuration
+
+Warning! Work in progress from this point ...
+
+
+# install the ceph csi driver
+# followed notes at https://gitlab.developers.cam.ac.uk/pfb29/manila-csi-kubespray
+
+```
+helm repo add ceph-csi https://ceph.github.io/csi-charts
+helm --kubeconfig=./${CLUSTER_NAME}.kubeconfig install --namespace kube-system ceph-csi-cephfs ceph-csi/ceph-csi-cephfs
+```
+
+# install the manila csi driver
+
+manila-values.yaml
+
+```
+---
+shareProtocols:
+  - protocolSelector: CEPHFS
+    fsGroupPolicy: None
+    fwdNodePluginEndpoint:
+      dir: /var/lib/kubelet/plugins/cephfs.csi.ceph.com
+      sockFile: csi.sock
+```
+
+```
+helm  repo add cpo https://kubernetes.github.io/cloud-provider-openstack
+helm install --kubeconfig=./${CLUSTER_NAME}.kubeconfig --namespace kube-system manila-csi cpo/openstack-manila-csi -f manila-values.yaml
+```
+
+# Create a secret for deploying our manila storage class, assumes we created an access credential in the target OpenStack project with suitable priviledges
+
+secrets.yaml
+
+```
+apiVersion: v1
+kind: Secret
+metadata:
+  name: csi-manila-secrets
+  namespace: default
+stringData:
+  # Mandatory
+  os-authURL: "https://arcus.openstack.hpc.cam.ac.uk:5000/v3"
+  os-region: "RegionOne"
+
+  # Authentication using user credentials
+  os-applicationCredentialID: "*****"
+  os-applicationCredentialSecret: "*******"
+```
+
+```
+kubectl apply --kubeconfig=./${CLUSTER_NAME}.kubeconfig -f secrets.yaml
+```
+
+# create a manila storage class using the access secret we just created
+
+```
+
+sc.yaml
+---
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: csi-manila-cephfs
+provisioner: cephfs.manila.csi.openstack.org
+parameters:
+  type: ceph01_cephfs # Manila share type
+  cephfs-mounter: kernel
+  csi.storage.k8s.io/provisioner-secret-name: csi-manila-secrets
+  csi.storage.k8s.io/provisioner-secret-namespace: default
+  csi.storage.k8s.io/node-stage-secret-name: csi-manila-secrets
+  csi.storage.k8s.io/node-stage-secret-namespace: default
+  csi.storage.k8s.io/node-publish-secret-name: csi-manila-secrets
+  csi.storage.k8s.io/node-publish-secret-namespace: default
+```
+
+```
+kubectl apply --kubeconfig=./${CLUSTER_NAME}.kubeconfig -f sc.yaml
+```
+
+# make manila the default storage class
+
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig patch storageclass csi-manila-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
+```
+
+# list the storage classes in the cluster
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get storageclass
+NAME                             PROVISIONER                       RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
+csi-cinder-sc-delete             cinder.csi.openstack.org          Delete          Immediate           true                   12d
+csi-cinder-sc-retain             cinder.csi.openstack.org          Retain          Immediate           true                   12d
+csi-manila-cephfs (default)      cephfs.manila.csi.openstack.org   Delete          Immediate           false                  5d5
+```
+
+# test access to cephfs service
+In Horizon GUI, manually create a share. Create a cephx access rule, then copy the access key and full storage path  
+
+Create a secret containing the access key
+
+ceph-secret.yaml
+```
+apiVersion: v1
+kind: Secret
+metadata:
+  name: ceph-secret
+stringData:
+  key: ****
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f ceph-secret.yaml
+
+Create a test pod that mounts the ceph share as a volume. The ceph share path needs to be separated into a list of monitor addresses and the relative path, eg
+
+pod.yaml
+
+```
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: test-cephfs-share-pod
+spec:
+  containers:
+    - name: web-server
+      image: nginx
+      imagePullPolicy: IfNotPresent
+      volumeMounts:
+        - name: testpvc
+          mountPath: /var/lib/www
+        - name: cephfs
+          mountPath: "/mnt/cephfs"
+  volumes:
+    - name: testpvc
+      persistentVolumeClaim:
+        claimName: test-cephfs-share-pvc
+        readOnly: false
+    - name: cephfs
+      cephfs:
+        monitors:
+        - 10.4.200.9:6789
+        - 10.4.200.13:6789
+        - 10.4.200.17:6789
+        - 10.4.200.25:6789
+        - 10.4.200.26:6789
+        secretRef:
+          name: ceph-secret
+        readOnly: false
+        path: "/volumes/_nogroup/ca890f73-3e33-4e07-879c-f7ec0f5a8a17/52bcd13b-a358-40f0-9ffa-4334eb1e06ae"
+```
+
+Example uses nginx, so install that:
+
+```
+helm install --kubeconfig=./${CLUSTER_NAME}.kubeconfig nginx bitnami/nginx
+```
+
+deploy the pod
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f manila-csi-kubespray/pod.yaml
+```
+
+Inspect the pod to verify that the ceph share was successfully mounted
+
+# test jhub deployment, check where user areas get created
+
+deploy jhub, check where user area is created
+
+```
+helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
+helm --kubeconfig=./${CLUSTER_NAME}.kubeconfig upgrade --install jhub jupyterhub/jupyterhub --version=3.3.8
+```
+
+# port forward on control VM
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig --namespace=default port-forward service/proxy-public 8080:http
+```
+
+# port forward on laptop:
+ssh -i "gaia_jade_test_malcolm.pem" -L 8080:127.0.0.1:8080 rocky@192.41.122.174
+browse to 127.0.0.1:8080 and login, eg as user 'hhh'
+
+# on control VM, list pvs/pvcs
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get pv
+NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS           VOLUMEATTRIBUTESCLASS   REASON   AGE                         6h56m
+pvc-8b970f5c-440b-48f8-ae19-4fb35d20e85f   10Gi       RWO            Delete           Bound    default/claim-hhh               csi-manila-cephfs      <unset>           6h51m
+pvc-7d104b45-7efe-4250-b9fe-5bf441eb65a9   1Gi        RWO            Delete           Bound    default/hub-db-dir              csi-manila-cephfs      <unset>
+
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get pvc
+NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           VOLUMEATTRIBUTESCLASS   AGE
+claim-hhh               Bound    pvc-8b970f5c-440b-48f8-ae19-4fb35d20e85f   10Gi       RWO            csi-manila-cephfs      <unset>                 6h52m
+hub-db-dir              Bound    pvc-7d104b45-7efe-4250-b9fe-5bf441eb65a9   1Gi        RWO            csi-manila-cephfs      <unset>                 6h58m
+
+## Thoughts on automation and migration
+
+Each system that we deploy to will have different networking setup, storage services, image names, machine flavour. 
+Each system requires that a ClusterAPI image be built in that system from an Ubuntu image already present in that system.
+For each system, we generate a configuration file using clusterctl generate.
+Getting a working generation image and working combinations of images / flavours likely to be a trial and error process, little prospect for automation
+Once we have a working template for a given site, that template can be reused for that site, but that site only.
+Given a particular site with a working template, it should be possibe to automate creation of a cluster at that site.
+Each site will require specific post-creation configuration, e.g. ceph mounts on Arcus, nfs(?) mounts on BSC
+
+Manual stages:
+Install packer, clusterctl, server certificates etc.
+Manually build / test image in target environment, get working combinations of flavours and boot disk sizes. 
+Generate template file, adjust any arguments. 
+Once we've got this far, can automate using the template. 
+Note that we can't check templates into a repo, as they contain security information
+
+Automated stages:
+
+kubectl apply template file
+clusterctl describe until ready
+get kubeconfig file
+apply calico
+use openstack to lookup network id for new network (how do we get cluster name? from environment variable?)
+build application secret conf file 
+build secret in target environment
+complete setup
+install cinder storage classes
+
+do site-specific post-installation:
+get list of worker names via kubectl get nodes
+install ceph client on each worker node
+configure ceph on each worker node
+- mount ceph shares on Arcus. need list of shares to mount, lookup keys and create share mount on each worker VM 
+- attach shared volumes on Somerville, BSC? )
+- modify /etc/fstab rather than configuring from directory?
+
+Things to try:
+Automatic configuration of ceph network on arcus
+attach manila shares to pod instead of using ceph mounts (wont be available at every site)
+
+Generic scripts:
+
+lookup network id, build conf file
+lookup keys for ceph shares
+install list of ceph shares on VMs
+get list of worker node names and ip addresses
+
+
+
+
+
+
+
+

From 05709388a950fa38c33e198eda30f16e4812eba7 Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Thu, 12 Dec 2024 16:54:04 +0000
Subject: [PATCH 02/12] Create Readme.MD

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/ClusterAPIScripts/Readme.MD | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 notes/millingw/ClusterAPIScripts/Readme.MD

diff --git a/notes/millingw/ClusterAPIScripts/Readme.MD b/notes/millingw/ClusterAPIScripts/Readme.MD
new file mode 100644
index 00000000..6d065a69
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/Readme.MD
@@ -0,0 +1 @@
+### Placeholder for example ClusterAPI related scripts and things

From 806f4905efbe54f20c4a1d24373bfb338090ff6d Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Fri, 13 Dec 2024 11:02:31 +0000
Subject: [PATCH 03/12] Add files via upload

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 .../appcred-iris-gaia-red-demo.conf           |  11 ++
 .../ClusterAPIScripts/arcus-red-demo.yaml     | 176 ++++++++++++++++++
 .../ClusterAPIScripts/capi-arcus-demo.sh      |  32 ++++
 3 files changed, 219 insertions(+)
 create mode 100644 notes/millingw/ClusterAPIScripts/appcred-iris-gaia-red-demo.conf
 create mode 100644 notes/millingw/ClusterAPIScripts/arcus-red-demo.yaml
 create mode 100644 notes/millingw/ClusterAPIScripts/capi-arcus-demo.sh

diff --git a/notes/millingw/ClusterAPIScripts/appcred-iris-gaia-red-demo.conf b/notes/millingw/ClusterAPIScripts/appcred-iris-gaia-red-demo.conf
new file mode 100644
index 00000000..bd576630
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/appcred-iris-gaia-red-demo.conf
@@ -0,0 +1,11 @@
+[Global]
+auth-url=https://arcus.openstack.hpc.cam.ac.uk:5000
+region="RegionOne"
+application-credential-id="**REDACTED**"
+application-credential-secret="**REDACTED**"
+
+[LoadBalancer]
+use-octavia=true
+floating-network-id=d5560abe-c5d5-4653-a2f7-59636448f8fe
+network-id=37ad320e-18e7-4fac-8538-3232c6eeeec4
+
diff --git a/notes/millingw/ClusterAPIScripts/arcus-red-demo.yaml b/notes/millingw/ClusterAPIScripts/arcus-red-demo.yaml
new file mode 100644
index 00000000..0403285d
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/arcus-red-demo.yaml
@@ -0,0 +1,176 @@
+apiVersion: v1
+data:
+  cacert: **REDACTED**
+  clouds.yaml: ***REDACTED**
+kind: Secret
+metadata:
+  labels:
+    clusterctl.cluster.x-k8s.io/move: "true"
+  name: iris-gaia-red-demo-cloud-config
+  namespace: default
+---
+apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
+kind: KubeadmConfigTemplate
+metadata:
+  name: iris-gaia-red-demo-md-0
+  namespace: default
+spec:
+  template:
+    spec:
+      files: []
+      joinConfiguration:
+        nodeRegistration:
+          kubeletExtraArgs:
+            cloud-provider: external
+            provider-id: openstack:///'{{ instance_id }}'
+          name: '{{ local_hostname }}'
+---
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: Cluster
+metadata:
+  name: iris-gaia-red-demo
+  namespace: default
+spec:
+  clusterNetwork:
+    pods:
+      cidrBlocks:
+      - 192.168.0.0/16
+    serviceDomain: cluster.local
+  controlPlaneRef:
+    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
+    kind: KubeadmControlPlane
+    name: iris-gaia-red-demo-control-plane
+  infrastructureRef:
+    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+    kind: OpenStackCluster
+    name: iris-gaia-red-demo
+---
+apiVersion: cluster.x-k8s.io/v1beta1
+kind: MachineDeployment
+metadata:
+  name: iris-gaia-red-demo-md-0
+  namespace: default
+spec:
+  clusterName: iris-gaia-red-demo
+  replicas: 7
+  selector:
+    matchLabels: null
+  template:
+    spec:
+      bootstrap:
+        configRef:
+          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
+          kind: KubeadmConfigTemplate
+          name: iris-gaia-red-demo-md-0
+      clusterName: iris-gaia-red-demo
+      failureDomain: nova
+      infrastructureRef:
+        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+        kind: OpenStackMachineTemplate
+        name: iris-gaia-red-demo-md-0
+      version: 1.30.2
+---
+apiVersion: controlplane.cluster.x-k8s.io/v1beta1
+kind: KubeadmControlPlane
+metadata:
+  name: iris-gaia-red-demo-control-plane
+  namespace: default
+spec:
+  kubeadmConfigSpec:
+    clusterConfiguration:
+      apiServer:
+        extraArgs:
+          cloud-provider: external
+      controllerManager:
+        extraArgs:
+          cloud-provider: external
+    files: []
+    initConfiguration:
+      nodeRegistration:
+        kubeletExtraArgs:
+          cloud-provider: external
+          provider-id: openstack:///'{{ instance_id }}'
+        name: '{{ local_hostname }}'
+    joinConfiguration:
+      nodeRegistration:
+        kubeletExtraArgs:
+          cloud-provider: external
+          provider-id: openstack:///'{{ instance_id }}'
+        name: '{{ local_hostname }}'
+  machineTemplate:
+    infrastructureRef:
+      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+      kind: OpenStackMachineTemplate
+      name: iris-gaia-red-demo-control-plane
+  replicas: 3
+  version: 1.30.2
+---
+apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+kind: OpenStackCluster
+metadata:
+  name: iris-gaia-red-demo
+  namespace: default
+spec:
+  apiServerLoadBalancer:
+    enabled: true
+  externalNetwork:
+    id: 57add367-d205-4030-a929-d75617a7c63e
+  identityRef:
+    cloudName: iris-gaia-red
+    name: iris-gaia-red-demo-cloud-config
+  managedSecurityGroups:
+    allNodesSecurityGroupRules:
+    - description: Created by cluster-api-provider-openstack - BGP (calico)
+      direction: ingress
+      etherType: IPv4
+      name: BGP (Calico)
+      portRangeMax: 179
+      portRangeMin: 179
+      protocol: tcp
+      remoteManagedGroups:
+      - controlplane
+      - worker
+    - description: Created by cluster-api-provider-openstack - IP-in-IP (calico)
+      direction: ingress
+      etherType: IPv4
+      name: IP-in-IP (calico)
+      protocol: "4"
+      remoteManagedGroups:
+      - controlplane
+      - worker
+  managedSubnets:
+  - cidr: 10.6.0.0/24
+    dnsNameservers:
+    - 8.8.8.8
+---
+apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+kind: OpenStackMachineTemplate
+metadata:
+  name: iris-gaia-red-demo-control-plane
+  namespace: default
+spec:
+  template:
+    spec:
+      flavor: gaia.vm.cclake.4vcpu
+      image:
+        filter:
+          name: Ubuntu-Jammy-22.04-20240514-kube-1.30.2
+      sshKeyName: iris-malcolm-kube-test-keypair
+      rootVolume:
+        sizeGiB: 100
+---
+apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
+kind: OpenStackMachineTemplate
+metadata:
+  name: iris-gaia-red-demo-md-0
+  namespace: default
+spec:
+  template:
+    spec:
+      flavor: gaia.vm.cclake.54vcpu
+      image:
+        filter:
+          name: Ubuntu-Jammy-22.04-20240514-kube-1.30.2
+      sshKeyName: iris-malcolm-kube-test-keypair
+      rootVolume:
+        sizeGiB: 200
diff --git a/notes/millingw/ClusterAPIScripts/capi-arcus-demo.sh b/notes/millingw/ClusterAPIScripts/capi-arcus-demo.sh
new file mode 100644
index 00000000..ec005cef
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/capi-arcus-demo.sh
@@ -0,0 +1,32 @@
+#! /bin/bash
+
+#source /tmp/env.rc appcred-rundeckdemo01-clouds.yaml openstack
+
+b64encode(){
+  # Check if wrap is supported. Otherwise, break is supported.
+  if echo | base64 --wrap=0 &> /dev/null; then
+    base64 --wrap=0 $1
+  else
+    base64 --break=0 $1
+  fi
+}
+
+export OPENSTACK_CLOUD=iris-gaia-red
+export OPENSTACK_CLOUD_YAML_B64=$( cat arcus-red.yaml | b64encode )
+export OPENSTACK_CLOUD_CACERT_B64=$( cat arcus-openstack-hpc-cam-ac-uk-chain.pem | b64encode )
+export OPENSTACK_FAILURE_DOMAIN=nova
+# export OPENSTACK_EXTERNAL_NETWORK_ID=dcb035587-60e2-48eb-ac97-ff5fa38084eba
+export OPENSTACK_EXTERNAL_NETWORK_ID=57add367-d205-4030-a929-d75617a7c63e
+export OPENSTACK_DNS_NAMESERVERS=8.8.8.8
+export OPENSTACK_CONTROL_PLANE_MACHINE_FLAVOR=gaia.vm.cclake.4vcpu
+export OPENSTACK_NODE_MACHINE_FLAVOR=gaia.vm.cclake.54vcpu
+export OPENSTACK_IMAGE_NAME=Ubuntu-Jammy-22.04-20240514-kube-1.30.2
+export OPENSTACK_SSH_KEY_NAME=iris-malcolm-kube-test-keypair
+
+export KUBERNETES_VERSION=1.30.2
+
+# optional
+export CLUSTER_NAME=iris-gaia-red-demo
+export CONTROL_PLANE_MACHINE_COUNT=3
+export WORKER_MACHINE_COUNT=2
+

From ef61c7db9fd77a971da6abf8ccae028e257b68d9 Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Mon, 13 Jan 2025 17:04:16 +0000
Subject: [PATCH 04/12] Script for autobuild of a ClusterAPI cluster

Script for building a kubernetes cluster in an OpenStack project. Assumes control images, management cluster etc have already been provisioned

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 .../ClusterAPIScripts/build_my_cluster.sh     | 229 ++++++++++++++++++
 1 file changed, 229 insertions(+)
 create mode 100644 notes/millingw/ClusterAPIScripts/build_my_cluster.sh

diff --git a/notes/millingw/ClusterAPIScripts/build_my_cluster.sh b/notes/millingw/ClusterAPIScripts/build_my_cluster.sh
new file mode 100644
index 00000000..9382f82c
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/build_my_cluster.sh
@@ -0,0 +1,229 @@
+#!/bin/bash
+
+# we make the following assumptions:
+# KUBECONFIG needs to be set to point at the ClusterAPI management cluster
+# CLUSTER_SPECIFICATION_FILE is a ClusterAPI yaml file containing templates for the cluster we want to build
+# CLUSTER_NAME is consistent with cluster name references in the specification file
+# CINDER_SECRETS_FILE contains cinder config details
+# CLUSTER_CREDENTIAL_FILE is configured to use an existing OpenStack network, so that we don't need to look up a network id
+# TODO handle dynamic network creation; if we're using ceph, better to use a preconfigured network cos otherwise its all a bit of a nightmare
+
+# TODO read this all from a yaml config file, instead of specifying it all here!
+export KUBECONFIG=/home/rocky/openstack/k8sdir/config
+export CLUSTER_NAME=iris-gaia-red-ceph
+#export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph.yaml
+#export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-secret.yaml
+export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-file-test.yaml
+export CLUSTER_CREDENTIAL_FILE=appcred-iris-gaia-red-fixed-bootstrap.conf
+export CINDER_SECRETS_FILE=cinder-values.yaml
+
+USE_MANILA=true
+MANILA_PROTOCOLS_FILE=./manila-csi-kubespray/values.yaml
+MANILA_SECRETS_FILE=./manila-csi-kubespray/secrets.yaml
+MANILA_STORAGE_CLASS_FILE=./manila-csi-kubespray/sc.yaml
+DEFAULT_STORAGE_CLASS=manila
+
+# check all our expected environment variables are set
+if [ -z "${KUBECONFIG}" ]; then
+   echo environment variable KUBECONFIG not set
+   exit 1
+fi
+
+if [ -z "${CLUSTER_NAME}" ]; then
+   echo environment variable CLUSTER_NAME not set
+   exit 1
+fi
+
+if [ -z "${CLUSTER_SPECIFICATION_FILE}" ]; then
+   echo environment variable CLUSTER_SPECIFICATION_FILE not set
+   exit 1
+fi
+
+if [ -z "${CLUSTER_CREDENTIAL_FILE}" ]; then
+   echo environment variable CLUSTER_CREDENTIAL_FILE not set
+   exit 1
+fi
+
+if [ -z "${CINDER_SECRETS_FILE}" ]; then
+   echo environment variable CINDER_SECRETS_FILE not set
+   exit 1
+fi
+
+# check all the input config files exist
+ 
+if [ ! -f "${KUBECONFIG}" ]; then
+   echo file ${KUBECONFIG} not found
+   exit 1
+fi
+
+if [ ! -f "${CLUSTER_SPECIFICATION_FILE}" ]; then
+   echo file ${CLUSTER_SPECIFICATION_FILE} not found
+   exit 1
+fi
+
+if [ ! -f "${CLUSTER_CREDENTIAL_FILE}" ]; then
+   echo file ${CLUSTER_CREDENTIAL_FILE} not found
+   exit 1
+fi
+
+if [ ! -f "${CINDER_SECRETS_FILE}" ]; then
+   echo file ${CINDER_SECRETS_FILE} not found
+   exit 1
+fi
+
+
+# check manila-specific environment variables and files
+if [ $USE_MANILA = true ]; then
+
+	if [ -z "${MANILA_PROTOCOLS_FILE}" ]; then
+   		echo environment variable MANILA_PROTOCOLS_FILE not set
+   		exit 1
+	fi
+
+	if [ -z "${MANILA_SECRETS_FILE}" ]; then
+   		echo environment variable MANILA_SECRETS_FILE not set
+   		exit 1
+	fi
+
+	if [ -z "${MANILA_PROTOCOLS_FILE}" ]; then
+   		echo environment variable MANILA_STORAGE_CLASS_FILE not set
+   		exit 1
+	fi
+
+	if [ ! -f "${MANILA_PROTOCOLS_FILE}" ]; then
+                echo file ${MANILA_PROTOCOLS_FILE} not found
+                exit 1
+        fi
+
+        if [ ! -f "${MANILA_SECRETS_FILE}" ]; then
+                echo file ${MANILA_SECRETS_FILE} not found
+                exit 1
+        fi
+
+        if [ ! -f "${MANILA_PROTOCOLS_FILE}" ]; then
+                echo file ${MANILA_STORAGE_CLASS_FILE} not set
+                exit 1
+        fi
+fi
+
+
+
+# create the cluster via the management cluster
+echo building the cluster ...
+kubectl apply -f ${CLUSTER_SPECIFICATION_FILE}
+
+# wait a couple of minutes, then loop loooking for the first control plane machine
+echo Waiting for cluster to initialise ...
+sleep 120
+
+echo Looping till first control plane machine is available
+control_plane_status='False'
+until [ $control_plane_status == 'True' ];
+do
+  sleep 60
+  control_plane_status=$(clusterctl describe cluster ${CLUSTER_NAME} --grouping=false | grep -E "Machine/${CLUSTER_NAME}-control-plane" | awk -v OFS='\t' 'FNR == 1{print $3}') 
+  echo $control_plane_status
+done
+
+# we should be able to get the cluster's KUBECONFIG file now
+clusterctl get kubeconfig ${CLUSTER_NAME} > ${CLUSTER_NAME}.kubeconfig
+
+
+# 
+# check we can get the initial set of nodes, otherwise we need to wait a bit longer
+# we should get at least our first control plane machine listed, with role 'control-plane'
+echo looping till control plane nodes responding
+control_plane_ready=false
+until [ $control_plane_ready = true ];
+do
+  sleep 60
+  get_nodes=$(kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get nodes | awk -v OFS='\t' 'FNR == 2{print $3}')
+  echo $get_nodes
+
+  # if it's ready, get_nodes should contain 'control-plane', otherwise keep looping
+  if [ $get_nodes == 'control-plane' ]; then
+    control_plane_ready=true
+  fi 
+  echo $control_plane_ready 
+done
+
+
+
+
+# start installing the control layer components
+echo installing calico components
+
+curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml -O
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f calico.yaml
+
+# create ceph secret before we build our worker nodes;
+# config will use this to kernel mount our ceph shares
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f cephx-secret.yaml
+
+
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig create secret -n kube-system generic cloud-config --from-file=cloud.conf=${CLUSTER_CREDENTIAL_FILE}
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/cloud-controller-manager-roles.yaml
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/cloud-controller-manager-role-bindings.yaml
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml                     
+# now we loop and wait till the cluster reports success
+echo waiting for cluster completion
+cluster_status='False'
+until [ $cluster_status == 'True' ];
+do
+  sleep 60
+  cluster_status=$( clusterctl describe cluster ${CLUSTER_NAME} --grouping=false | awk -v OFS='\t' 'FNR == 2{print $2}' )
+  echo $cluster_status
+done
+echo Cluster creation complete
+
+# we assume all OpenStack systems will have a Cinder service
+# (is this a safe assumption?)
+echo Installing cinder driver
+helm install --namespace=kube-system -f ${CINDER_SECRETS_FILE} --kubeconfig=./${CLUSTER_NAME}.kubeconfig cinder-csi cpo/openstack-cinder-csi
+
+echo Completed Cluster creation and installed Cinder storage classes
+
+
+# Ceph / Manila installation
+if [ $USE_MANILA = true ]; then
+echo Installing Manilla storage class
+
+# install the ceph csi driver
+# followed notes at https://gitlab.developers.cam.ac.uk/pfb29/manila-csi-kubespray
+
+helm repo add ceph-csi https://ceph.github.io/csi-charts
+helm --kubeconfig=./${CLUSTER_NAME}.kubeconfig install --namespace kube-system ceph-csi-cephfs ceph-csi/ceph-csi-cephfs
+
+# install the manila csi driver
+helm  repo add cpo https://kubernetes.github.io/cloud-provider-openstack
+helm install --kubeconfig=./${CLUSTER_NAME}.kubeconfig --namespace kube-system manila-csi cpo/openstack-manila-csi -f ${MANILA_PROTOCOLS_FILE}
+
+# configure our access credentials for the manila service
+kubectl apply --kubeconfig=./${CLUSTER_NAME}.kubeconfig -f ${MANILA_SECRETS_FILE}
+
+# create a storage class to let us use Manila from kubernetes
+kubectl apply --kubeconfig=./${CLUSTER_NAME}.kubeconfig -f ${MANILA_STORAGE_CLASS_FILE}
+
+# make Manila the default storage class, if specified
+if [ $DEFAULT_STORAGE_CLASS == 'manila' ]; then
+echo Making manila the default storage class
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig patch storageclass csi-manila-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
+fi
+
+echo Manila installation complete
+
+fi
+
+# TODO - wait for our workers to become available?
+
+echo Looping till workers are available
+worker_nodes_status='False'
+until [ $worker_nodes_status == 'True' ];
+do
+  sleep 60
+  worker_nodes_status=$(clusterctl describe cluster ${CLUSTER_NAME} --grouping=false | grep -E "MachineDeployment" | awk -v OFS='\t' '{print $2}')
+  echo $worker_nodes_status
+done
+
+
+

From bc6d1f73bcfb39b043fe0d858f504ec02cf5ef2c Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Mon, 13 Jan 2025 17:25:09 +0000
Subject: [PATCH 05/12] example worker config

Example worker config with ceph share kernel-mounted into the node

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 .../KubeadmConfigTemplate.yaml                | 22 +++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 notes/millingw/ClusterAPIScripts/KubeadmConfigTemplate.yaml

diff --git a/notes/millingw/ClusterAPIScripts/KubeadmConfigTemplate.yaml b/notes/millingw/ClusterAPIScripts/KubeadmConfigTemplate.yaml
new file mode 100644
index 00000000..4cd8a198
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/KubeadmConfigTemplate.yaml
@@ -0,0 +1,22 @@
+kind: KubeadmConfigTemplate
+metadata:
+  name: iris-gaia-red-ceph-md-0
+  namespace: default
+spec:
+  template:
+    spec:
+      mounts: []
+      preKubeadmCommands: ["apt-get update;", "apt-get install ceph-common -y;", "mkdir -p /mnt/kubernetes_scratch_share", "echo 10.4.200.9:6789,10.4.200.13:6789,10.4.200.17:6789,10.4.200.25:6789,10.4.200.26:6789:/volumes/_nogroup/280b44fc-d423-4496-8fb8-79bfc1f58b97/35e407e9-a34b-4c64-b480-3380002d64f8 /mnt/kubernetes_scratch_share ceph name=kubernetes-scratch-share,noatime,_netdev 0 2 >> /etc/fstab"]
+      files:
+      - path: /etc/ceph/ceph.conf
+        content: |
+              [global]
+              fsid = a900cf30-f8a3-42bf-98d6-af7ce92f1a1a
+              mon_host = [v2:10.4.200.13:3300/0,v1:10.4.200.13:6789/0] [v2:10.4.200.9:3300/0,v1:10.4.200.9:6789/0] [v2:10.4.200.17:3300/0,v1:10.4.200.17:6789/0] [v2:10.4.200.26:3300/0,v1:10.4.200.26:6789/0] [v2:10.4.200.25:3300/0,v1:10.4.200.25:6789/0]
+
+      - path: /etc/ceph/ceph.client.kubernetes-scratch-share.keyring
+        content: |
+          [client.kubernetes-scratch-share]
+          key = **REDACTED**
+
+      postKubeadmCommands: ["sudo mount -a"]
\ No newline at end of file

From 329b199d6a665ceada720543aeead42f31df891a Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Tue, 21 Jan 2025 15:48:38 +0000
Subject: [PATCH 06/12] More output, better config

More verbose output; config environment variables now read from a resource file

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 .../ClusterAPIScripts/build_my_cluster.sh     | 67 ++++++++++++-------
 1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/notes/millingw/ClusterAPIScripts/build_my_cluster.sh b/notes/millingw/ClusterAPIScripts/build_my_cluster.sh
index 9382f82c..9b962899 100644
--- a/notes/millingw/ClusterAPIScripts/build_my_cluster.sh
+++ b/notes/millingw/ClusterAPIScripts/build_my_cluster.sh
@@ -9,19 +9,34 @@
 # TODO handle dynamic network creation; if we're using ceph, better to use a preconfigured network cos otherwise its all a bit of a nightmare
 
 # TODO read this all from a yaml config file, instead of specifying it all here!
-export KUBECONFIG=/home/rocky/openstack/k8sdir/config
-export CLUSTER_NAME=iris-gaia-red-ceph
-#export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph.yaml
-#export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-secret.yaml
-export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-file-test.yaml
-export CLUSTER_CREDENTIAL_FILE=appcred-iris-gaia-red-fixed-bootstrap.conf
-export CINDER_SECRETS_FILE=cinder-values.yaml
-
-USE_MANILA=true
-MANILA_PROTOCOLS_FILE=./manila-csi-kubespray/values.yaml
-MANILA_SECRETS_FILE=./manila-csi-kubespray/secrets.yaml
-MANILA_STORAGE_CLASS_FILE=./manila-csi-kubespray/sc.yaml
-DEFAULT_STORAGE_CLASS=manila
+#export KUBECONFIG=/home/rocky/openstack/k8sdir/config
+#export CLUSTER_NAME=iris-gaia-red-ceph
+##export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph.yaml
+##export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-secret.yaml
+#export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-file-test.yaml
+#export CLUSTER_CREDENTIAL_FILE=appcred-iris-gaia-red-fixed-bootstrap.conf
+#export CINDER_SECRETS_FILE=cinder-values.yaml
+
+#USE_MANILA=true
+#MANILA_PROTOCOLS_FILE=./manila-csi-kubespray/values.yaml
+#MANILA_SECRETS_FILE=./manila-csi-kubespray/secrets.yaml
+#MANILA_STORAGE_CLASS_FILE=./manila-csi-kubespray/sc.yaml
+#DEFAULT_STORAGE_CLASS=manila
+
+# setup the environment variables for our build
+source cluster_config.rc
+
+echo KUBECONFIG $KUBECONFIG
+echo CLUSTER_NAME $CLUSTER_NAME
+echo CLUSTER_SPECIFICATION_FILE $CLUSTER_SPECIFICATION_FILE
+echo CLUSTER_CREDENTIAL_FILE $CLUSTER_CREDENTIAL_FILE
+echo CINDER_SECRETS_FILE $CINDER_SECRETS_FILE
+
+echo USE_MANILA $USE_MANILA
+echo MANILA_PROTOCOLS_FILE $MANILA_PROTOCOLS_FILE
+echo MANILA_SECRETS_FILE $MANILA_SECRETS_FILE
+echo MANILA_STORAGE_CLASS_FILE $MANILA_STORAGE_CLASS_FILE
+echo DEFAULT_STORAGE_CLASS $DEFAULT_STORAGE_CLASS
 
 # check all our expected environment variables are set
 if [ -z "${KUBECONFIG}" ]; then
@@ -109,7 +124,7 @@ fi
 
 
 # create the cluster via the management cluster
-echo building the cluster ...
+echo building cluster $CLUSTER_NAME
 kubectl apply -f ${CLUSTER_SPECIFICATION_FILE}
 
 # wait a couple of minutes, then loop loooking for the first control plane machine
@@ -122,7 +137,7 @@ until [ $control_plane_status == 'True' ];
 do
   sleep 60
   control_plane_status=$(clusterctl describe cluster ${CLUSTER_NAME} --grouping=false | grep -E "Machine/${CLUSTER_NAME}-control-plane" | awk -v OFS='\t' 'FNR == 1{print $3}') 
-  echo $control_plane_status
+  echo Control plane status: $control_plane_status
 done
 
 # we should be able to get the cluster's KUBECONFIG file now
@@ -137,17 +152,19 @@ control_plane_ready=false
 until [ $control_plane_ready = true ];
 do
   sleep 60
+  echo Polling nodes to check if basic services up yet
   get_nodes=$(kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get nodes | awk -v OFS='\t' 'FNR == 2{print $3}')
-  echo $get_nodes
+  #echo $get_nodes
 
   # if it's ready, get_nodes should contain 'control-plane', otherwise keep looping
   if [ $get_nodes == 'control-plane' ]; then
     control_plane_ready=true
   fi 
-  echo $control_plane_ready 
+  #echo $control_plane_ready 
+  echo not ready yet, waiting ...
 done
 
-
+echo Nodes responding, installing control layer components
 
 
 # start installing the control layer components
@@ -158,7 +175,7 @@ kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f calico.yaml
 
 # create ceph secret before we build our worker nodes;
 # config will use this to kernel mount our ceph shares
-kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f cephx-secret.yaml
+#kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f cephx-secret.yaml
 
 
 kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig create secret -n kube-system generic cloud-config --from-file=cloud.conf=${CLUSTER_CREDENTIAL_FILE}
@@ -170,9 +187,10 @@ echo waiting for cluster completion
 cluster_status='False'
 until [ $cluster_status == 'True' ];
 do
+  echo polling cluster status ...
   sleep 60
   cluster_status=$( clusterctl describe cluster ${CLUSTER_NAME} --grouping=false | awk -v OFS='\t' 'FNR == 2{print $2}' )
-  echo $cluster_status
+  #echo Cluster status: $cluster_status
 done
 echo Cluster creation complete
 
@@ -215,15 +233,18 @@ echo Manila installation complete
 fi
 
 # TODO - wait for our workers to become available?
+# at this point we should have a functional k8s cluster
+# but it might take some time for all the workers to become available
+# or never, if we asked for too many machines ...
 
-echo Looping till workers are available
+echo Looping till all workers are available
 worker_nodes_status='False'
 until [ $worker_nodes_status == 'True' ];
 do
   sleep 60
   worker_nodes_status=$(clusterctl describe cluster ${CLUSTER_NAME} --grouping=false | grep -E "MachineDeployment" | awk -v OFS='\t' '{print $2}')
-  echo $worker_nodes_status
+  echo worker status: $worker_nodes_status
 done
 
-
+echo Cluster $CLUSTER_NAME creation complete
 

From 6f2511f2547c1bac8c45bddd3797662cc8f2cbec Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Tue, 21 Jan 2025 15:51:04 +0000
Subject: [PATCH 07/12] Added config file

Environment variables for the cluster building script

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/ClusterAPIScripts/cluster_config.rc | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 notes/millingw/ClusterAPIScripts/cluster_config.rc

diff --git a/notes/millingw/ClusterAPIScripts/cluster_config.rc b/notes/millingw/ClusterAPIScripts/cluster_config.rc
new file mode 100644
index 00000000..71a5d08d
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/cluster_config.rc
@@ -0,0 +1,12 @@
+# TODO read this all from a yaml config file, instead of specifying it all here!
+export KUBECONFIG=/home/rocky/openstack/k8sdir/config
+export CLUSTER_NAME=iris-gaia-red-ceph
+export CLUSTER_SPECIFICATION_FILE=capi-iris-gaia-red-ceph-file-test.yaml
+export CLUSTER_CREDENTIAL_FILE=appcred-iris-gaia-red-fixed-bootstrap.conf
+export CINDER_SECRETS_FILE=cinder-values.yaml
+
+USE_MANILA=true
+MANILA_PROTOCOLS_FILE=./manila-csi-kubespray/values.yaml
+MANILA_SECRETS_FILE=./manila-csi-kubespray/secrets.yaml
+MANILA_STORAGE_CLASS_FILE=./manila-csi-kubespray/sc.yaml
+DEFAULT_STORAGE_CLASS=manila

From 204fd76f589e3e18edfd01dd761f7ba2ef6ad67e Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Tue, 21 Jan 2025 16:50:45 +0000
Subject: [PATCH 08/12] Updated with script config notes

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/ClusterAPIScripts/Readme.MD | 44 +++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/notes/millingw/ClusterAPIScripts/Readme.MD b/notes/millingw/ClusterAPIScripts/Readme.MD
index 6d065a69..5896654e 100644
--- a/notes/millingw/ClusterAPIScripts/Readme.MD
+++ b/notes/millingw/ClusterAPIScripts/Readme.MD
@@ -1 +1,43 @@
-### Placeholder for example ClusterAPI related scripts and things
+## ClusterAPI build scripts  
+
+Building a cluster involves multiple steps and lots of configuration files.  
+Each site that we deploy to is likely to have different storage configurations, networks, credentials  
+Here I am trying to collect together the set of config files for each site that we are deploying to, and using a single deployment script, build_my_cluster.sh  
+build_my_cluster.sh assumes that all preparatory work has already been done, ie a management cluster has been created, compatible ClusterAPI images have been created and tested in the target OpenStack environments, and a cluster template has been generated.
+The following tools must be installed prior to running the script: kubectl, clusterctl, openstack cli
+
+The script reads a config file, which sets all the necessary environment variables that the script expects:
+
+export KUBECONFIG=<path to the KUBECONFIG file for our ClusterAPI management cluster>
+export CLUSTER_NAME=<name of the cluster we are deploying, must match the cluster specification file>
+export CLUSTER_SPECIFICATION_FILE=<generated ClusterAPI specification file, contains all the credentials and templates for our cluster creation>
+export CLUSTER_CREDENTIAL_FILE=<path to configuration file containing credentials for target OpenStack project and load balancer / networking configuration>
+export CINDER_SECRETS_FILE=<path to credentials file for installing the Cinder storage driver into our cluster (we assume OpenStack will always have CinderAvailable  
+
+The following is Manila-specific. On Arcus and Somerville we have the Manila service available, which gives us access to ceph. Other sites may not provide this, in which case set USE_MANILA=false
+
+USE_MANILA=true
+MANILA_PROTOCOLS_FILE=values.yaml
+MANILA_SECRETS_FILE=secrets.yaml
+MANILA_STORAGE_CLASS_FILE=sc.yaml
+DEFAULT_STORAGE_CLASS=manila
+
+Running ./build_my_cluster.sh will build a new cluster in the targeted OpenStack project.  
+The following stages are run:
+  Build the initial cluster
+  Wait for the initial control plane to become available
+  Wait for the basic service to start
+  Install the control plane software (Calico)
+  Wait for initialisation
+  Install cinder storage driver
+  (Optionally install Manila storage driver)
+  Wait for all workers to join
+
+Cluster creation can be monitored with clusterctl, ie clusterctl describe cluster $CLUSTER_NAME
+
+Note that a cluster may be ready for use before all workers are ready; the script may loop indefinitely if the target project can't provide the requested number of workers.
+
+The resulting cluster and KUBECONFIG file can then be used to install kubernetes services in the usual fashion.
+  
+The intention is to maintain a set of production scripts for each deployment site, with a separate master configuration file for each site to be sourced by the build script.
+

From a65102388b22bdbb8ba3c048877cebd56cd59e36 Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Tue, 21 Jan 2025 17:03:07 +0000
Subject: [PATCH 09/12] Markdown formatting

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/ClusterAPIScripts/Readme.MD | 25 +++++++++++-----------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/notes/millingw/ClusterAPIScripts/Readme.MD b/notes/millingw/ClusterAPIScripts/Readme.MD
index 5896654e..5406db82 100644
--- a/notes/millingw/ClusterAPIScripts/Readme.MD
+++ b/notes/millingw/ClusterAPIScripts/Readme.MD
@@ -2,36 +2,37 @@
 
 Building a cluster involves multiple steps and lots of configuration files.  
 Each site that we deploy to is likely to have different storage configurations, networks, credentials  
-Here I am trying to collect together the set of config files for each site that we are deploying to, and using a single deployment script, build_my_cluster.sh  
+Here I am trying to collect together the set of config files for each site that we are deploying to, and using a single deployment script, build_my_cluster.sh, to try and make deployment a bit less manual    
 build_my_cluster.sh assumes that all preparatory work has already been done, ie a management cluster has been created, compatible ClusterAPI images have been created and tested in the target OpenStack environments, and a cluster template has been generated.
 The following tools must be installed prior to running the script: kubectl, clusterctl, openstack cli
 
 The script reads a config file, which sets all the necessary environment variables that the script expects:
 
+```
 export KUBECONFIG=<path to the KUBECONFIG file for our ClusterAPI management cluster>
 export CLUSTER_NAME=<name of the cluster we are deploying, must match the cluster specification file>
 export CLUSTER_SPECIFICATION_FILE=<generated ClusterAPI specification file, contains all the credentials and templates for our cluster creation>
 export CLUSTER_CREDENTIAL_FILE=<path to configuration file containing credentials for target OpenStack project and load balancer / networking configuration>
 export CINDER_SECRETS_FILE=<path to credentials file for installing the Cinder storage driver into our cluster (we assume OpenStack will always have CinderAvailable  
-
+```
 The following is Manila-specific. On Arcus and Somerville we have the Manila service available, which gives us access to ceph. Other sites may not provide this, in which case set USE_MANILA=false
-
+```
 USE_MANILA=true
 MANILA_PROTOCOLS_FILE=values.yaml
 MANILA_SECRETS_FILE=secrets.yaml
 MANILA_STORAGE_CLASS_FILE=sc.yaml
 DEFAULT_STORAGE_CLASS=manila
-
+```
 Running ./build_my_cluster.sh will build a new cluster in the targeted OpenStack project.  
 The following stages are run:
-  Build the initial cluster
-  Wait for the initial control plane to become available
-  Wait for the basic service to start
-  Install the control plane software (Calico)
-  Wait for initialisation
-  Install cinder storage driver
-  (Optionally install Manila storage driver)
-  Wait for all workers to join
+* Build the initial cluster
+* Wait for the initial control plane to become available
+* Wait for the basic service to start
+* Install the control plane software (Calico)
+* Wait for initialisation
+* Install cinder storage driver
+* (Optionally install Manila storage driver)
+* Wait for all workers to join
 
 Cluster creation can be monitored with clusterctl, ie clusterctl describe cluster $CLUSTER_NAME
 

From f5850253aab530de8b06cf72a65fa1b0ad588e1a Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Wed, 22 Jan 2025 11:22:04 +0000
Subject: [PATCH 10/12] More tidying up of notes

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/DeployClusterAPI.md | 230 +++++++++--------------------
 1 file changed, 71 insertions(+), 159 deletions(-)

diff --git a/notes/millingw/DeployClusterAPI.md b/notes/millingw/DeployClusterAPI.md
index c696795b..f79ce005 100644
--- a/notes/millingw/DeployClusterAPI.md
+++ b/notes/millingw/DeployClusterAPI.md
@@ -62,11 +62,13 @@ This turns our starting magnum-created kubernetes cluster into a ClusterAPI mana
 clusterctl init --infrastructure openstack
 ```
 
-Our cluster on Somerville is now our management cluster.
+Our cluster on Somerville is now our management cluster. We can use this to deploy and manage multiple OpenStack clusters on different sites.  
+If the management cluster is accidently deleted, then our worker clusters become independent and will still work, but won't be manageable via ClusterAPI.
 
 # Build CAPI image in target OpenStack environment:
 
-Next, we need to build a control image in our target OpenStack environment
+Next, we need to build a control image in our target OpenStack environment. The management cluster will use this image to create clusters in the target project.  
+Prerequisites are an existing Ubuntu image in the target project, and OpenStack credentials with project-level permissions for the target project.
 
 Install Packer on command/control VM:
 
@@ -102,7 +104,7 @@ packer init reqs-build.pkr.hcl
 
 create packer_var_file.json, edited for arcus red project
 
-Note that I had to add packer_build_ingest security group to arcus project to allow ssh access for packer to build image
+Note that I had to add a "packer_build_ingest" security group to the arcus project to allow ssh access for packer to build the image
 "networks" is existing router in OpenStack project, did not have to create this
 CUDN-Internet is existing floating ip pool name in gaia red project
 Had to work out flavor and image name from looking at options in the arcus gaia red OpenStack project and doing some trial VM creations to get good combinations
@@ -169,7 +171,7 @@ Notes assume server certificates saved to arcus-openstack-hpc-cam-ac-uk.pem
 
 Create environment variable script for configuring clusterctl deployment.
 Note that a value must be supplied for OPENSTACK_DNS_NAMESERVERS must be supplied for the config file generation; however, it may be necessary to edit or delete this from the generated config file (see below).
-(We've seen that on Arcus the value is ignored, but on BSC it is used directly)
+(We've seen that on Arcus the value is ignored, but on BSC it is used directly and messes things up)
 
 ```    
 capi-arcus-red-vars.sh:
@@ -216,13 +218,13 @@ export KUBECONFIG=/home/rocky/openstack/k8sdir/config
 
 ## Create ClusterAPI config
 
-# generate a template file for the new cluster using the environment variables we set
-# capi-red.yaml will be an openstack-specific, project specific template file for building a new k8s cluster
-# this does not actually create a cluster, just a new template for building a cluster
+generate a template file for the new cluster using the environment variables we set
+capi-red.yaml will be an openstack-specific, project specific template file for building a new k8s cluster
+Note this does not actually create a cluster, just a new template for building a cluster
 
 clusterctl generate cluster iris-gaia-red > capi-red.yaml
 
-Note that we can't check the generated yaml file into public github, as it contains (base64-encoded) access credentials for OpenStack
+Warning! Note that we can't check the generated yaml file into public github, as it contains (base64-encoded) access credentials for OpenStack
 
 The DNS configuration isn't required although the generate script insists that the environment variable is set. 
 You can remove the dns server reference from the config yaml ("dnsNameservers", see below), if not required. (See above note about BSC)
@@ -376,12 +378,12 @@ Watch progress
 clusterctl describe cluster ${CLUSTER_NAME}
 ```
 
-The cluster initialises with no available storage classes, therefore applications cannot immediately be deployed.
+The cluster initialises with no available storage classes, therefore applications cannot immediately be deployed.  
+We assume OpenStack systems will always provide a Cinder storage service, so install the Cinder storage driver into our new cluster.
 
 # Install cinder driver
 Install the cinder helm chart
 
-
 Edit cinder-values.yaml to match our deployed cluster. We point it at the secret we already created during the calico installation
 
 ```
@@ -419,6 +421,10 @@ Note: it should be possible to automate this through the ClusterAPI template, bu
 
 # mount data shares 
 At this point our cluster is ready to use. However, we need to be able to access the GAIA DR3 (and potentially other) data from our services.  
+
+If we used a pre-existing network already configured to use the site-specific storage service network, and configured mount instructions in the worker template, then we shouldn't have anything further to do to access the data. Otherwise, we have some work to do in configuring routers and manually mounting services. 
+
+The following instructions are for Arcus. Other sites will have different requirements.   
 On the arcus deployment, data is held in a separate project ("iris-gaia-data") within the same physical hardware.  
 In the Horizon GUI, select iris-gaia-data in the project list, then navigate to "shares".  
 Identify the required data share, and note the share path and the associated cephx access rule and key.
@@ -459,7 +465,46 @@ Filesystem
 10.4.200.9:6789,10.4.200.13:6789,10.4.200.17:6789,10.4.200.25:6789,10.4.200.26:6789:/volumes/_nogroup/fa5309a4-1b69-4713-b298-c8d7a479f86f/d53177c6-c45c-4583-9947-d50ab931445c   10G     0   10G   0% /mnt/cephfs
 ```
 
-Note to self - write a script to automate the above!
+Doing this for each machine in our cluster is clearly not ideal. The ClusterAPI template allows us to specify extended configuration information as follows.  
+Here, before worker machines join our cluster, we install and configure ceph, and create keyring files for our shares, and create mount entries in /etc/fstab  
+Then, we force a remount as the worker joins the cluster. (This does assume the ceph network has already been configured, otherwise the worker will likely fail).
+
+```
+kind: KubeadmConfigTemplate
+metadata:
+  name: iris-gaia-red-ceph-md-0
+  namespace: default
+spec:
+  template:
+    spec:
+      mounts: []
+      preKubeadmCommands: ["apt-get update;", "apt-get install ceph-common -y;", "mkdir -p /mnt/kubernetes_scratch_share", "echo 10.4.200.9:6789,10.4.200.13:67
+89,10.4.200.17:6789,10.4.200.25:6789,10.4.200.26:6789:/volumes/_nogroup/280b44fc-d423-4496-8fb8-79bfc1f58b97/35e407e9-a34b-4c64-b480-3380002d64f8 /mnt/kubernet
+es_scratch_share ceph name=kubernetes-scratch-share,noatime,_netdev 0 2 >> /etc/fstab"]
+      files:
+      - path: /etc/ceph/ceph.conf
+        content: |
+              [global]
+              fsid = a900cf30-f8a3-42bf-98d6-af7ce92f1a1a
+              mon_host = [v2:10.4.200.13:3300/0,v1:10.4.200.13:6789/0] [v2:10.4.200.9:3300/0,v1:10.4.200.9:6789/0] [v2:10.4.200.17:3300/0,v1:10.4.200.17:6789/0
+] [v2:10.4.200.26:3300/0,v1:10.4.200.26:6789/0] [v2:10.4.200.25:3300/0,v1:10.4.200.25:6789/0]
+
+      - path: /etc/ceph/ceph.client.kubernetes-scratch-share.keyring
+        content: |
+          [client.kubernetes-scratch-share]
+          key = REDACTED
+
+      postKubeadmCommands: ["sudo mount -a"]
+
+      joinConfiguration:
+        nodeRegistration:
+          kubeletExtraArgs:
+            cloud-provider: external
+            provider-id: openstack:///'{{ instance_id }}'
+          name: '{{ local_hostname }}'
+```
+
+(It should be possible to configure other storage types, such as nfs, in a similar fashion)  
 
 Now that all our workers have the data share mounted, we can access it via a hostPath mount from our pods, eg
 
@@ -481,7 +526,7 @@ The (read-only) DR3 data should now be accessible in the pod at /mnt/dr3_data_sh
 
 ## rescale cluster
 
-The management cluster is used to view active workers and rescale a running worker cluster, via the machinedeployments class.
+The management cluster can be used to view active workers and rescale a running worker cluster, via the machinedeployments class.
 e.g.
 
 ```
@@ -490,12 +535,16 @@ NAME                      CLUSTER              REPLICAS   READY   UPDATED   UNAV
 bsc-gaia-md-0             bsc-gaia             3          3       3         0             Running   25h    v1.30.2
 iris-gaia-red-ceph-md-0   iris-gaia-red-ceph   4          4       4         0             Running   22d    v1.30.2
 iris-gaia-red-demo-md-0   iris-gaia-red-demo   7          7       7         0             Running   6d2h   v1.30.2
+```
 
-$ kubectl scale machinedeployment iris-gaia-red-demo-md-0 --replicas=9
+Increase number of workers for one of our clusters
 
+```
+$ kubectl scale machinedeployment iris-gaia-red-demo-md-0 --replicas=9
 ```
 
-Note that with our current deployment, new VMs will not automatically get the ceph mounts. This will require manual intervention to perform the ceph configuration
+If we specified the storage mounts in our cluster template, then these should automatically be applied when the new worker joins the cluster.  
+However, if we created the mounts manually, this will need to be repeated manually for the new worker.
 
 # Deleting a cluster
 
@@ -547,12 +596,11 @@ The deployed clusters will still function independently, assuming we have their
 However, we should do everything to avoid this happening ...
 
 
-## Ceph and Manila CSI configuration
-
-Warning! Work in progress from this point ...
+## Manila configuration
+On Arcus and Somerville we have access to a Manila service. This effectively acts as a higher level storage service, and supports multiple protocols.
+Currently these sites are configured to support ceph via Manila, so we can install the manila storage driver into our cluster.
 
-
-# install the ceph csi driver
+# First install the ceph csi driver as manila will need it
 # followed notes at https://gitlab.developers.cam.ac.uk/pfb29/manila-csi-kubespray
 
 ```
@@ -629,6 +677,9 @@ parameters:
 kubectl apply --kubeconfig=./${CLUSTER_NAME}.kubeconfig -f sc.yaml
 ```
 
+We now have the manila storage driver installed.
+We can make this the default storage class, so any user volumes are automatically created as ceph shares instead of cinder volumes
+
 # make manila the default storage class
 
 ```
@@ -644,150 +695,11 @@ csi-cinder-sc-retain             cinder.csi.openstack.org          Retain
 csi-manila-cephfs (default)      cephfs.manila.csi.openstack.org   Delete          Immediate           false                  5d5
 ```
 
-# test access to cephfs service
-In Horizon GUI, manually create a share. Create a cephx access rule, then copy the access key and full storage path  
-
-Create a secret containing the access key
-
-ceph-secret.yaml
-```
-apiVersion: v1
-kind: Secret
-metadata:
-  name: ceph-secret
-stringData:
-  key: ****
-```
-kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f ceph-secret.yaml
-
-Create a test pod that mounts the ceph share as a volume. The ceph share path needs to be separated into a list of monitor addresses and the relative path, eg
-
-pod.yaml
-
-```
----
-apiVersion: v1
-kind: Pod
-metadata:
-  name: test-cephfs-share-pod
-spec:
-  containers:
-    - name: web-server
-      image: nginx
-      imagePullPolicy: IfNotPresent
-      volumeMounts:
-        - name: testpvc
-          mountPath: /var/lib/www
-        - name: cephfs
-          mountPath: "/mnt/cephfs"
-  volumes:
-    - name: testpvc
-      persistentVolumeClaim:
-        claimName: test-cephfs-share-pvc
-        readOnly: false
-    - name: cephfs
-      cephfs:
-        monitors:
-        - 10.4.200.9:6789
-        - 10.4.200.13:6789
-        - 10.4.200.17:6789
-        - 10.4.200.25:6789
-        - 10.4.200.26:6789
-        secretRef:
-          name: ceph-secret
-        readOnly: false
-        path: "/volumes/_nogroup/ca890f73-3e33-4e07-879c-f7ec0f5a8a17/52bcd13b-a358-40f0-9ffa-4334eb1e06ae"
-```
-
-Example uses nginx, so install that:
-
-```
-helm install --kubeconfig=./${CLUSTER_NAME}.kubeconfig nginx bitnami/nginx
-```
-
-deploy the pod
-```
-kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f manila-csi-kubespray/pod.yaml
-```
-
-Inspect the pod to verify that the ceph share was successfully mounted
-
-# test jhub deployment, check where user areas get created
-
-deploy jhub, check where user area is created
-
-```
-helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
-helm --kubeconfig=./${CLUSTER_NAME}.kubeconfig upgrade --install jhub jupyterhub/jupyterhub --version=3.3.8
-```
-
-# port forward on control VM
-```
-kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig --namespace=default port-forward service/proxy-public 8080:http
-```
-
-# port forward on laptop:
-ssh -i "gaia_jade_test_malcolm.pem" -L 8080:127.0.0.1:8080 rocky@192.41.122.174
-browse to 127.0.0.1:8080 and login, eg as user 'hhh'
-
-# on control VM, list pvs/pvcs
-kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get pv
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS           VOLUMEATTRIBUTESCLASS   REASON   AGE                         6h56m
-pvc-8b970f5c-440b-48f8-ae19-4fb35d20e85f   10Gi       RWO            Delete           Bound    default/claim-hhh               csi-manila-cephfs      <unset>           6h51m
-pvc-7d104b45-7efe-4250-b9fe-5bf441eb65a9   1Gi        RWO            Delete           Bound    default/hub-db-dir              csi-manila-cephfs      <unset>
-
-kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get pvc
-NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           VOLUMEATTRIBUTESCLASS   AGE
-claim-hhh               Bound    pvc-8b970f5c-440b-48f8-ae19-4fb35d20e85f   10Gi       RWO            csi-manila-cephfs      <unset>                 6h52m
-hub-db-dir              Bound    pvc-7d104b45-7efe-4250-b9fe-5bf441eb65a9   1Gi        RWO            csi-manila-cephfs      <unset>                 6h58m
-
-## Thoughts on automation and migration
-
-Each system that we deploy to will have different networking setup, storage services, image names, machine flavour. 
-Each system requires that a ClusterAPI image be built in that system from an Ubuntu image already present in that system.
-For each system, we generate a configuration file using clusterctl generate.
-Getting a working generation image and working combinations of images / flavours likely to be a trial and error process, little prospect for automation
-Once we have a working template for a given site, that template can be reused for that site, but that site only.
-Given a particular site with a working template, it should be possibe to automate creation of a cluster at that site.
-Each site will require specific post-creation configuration, e.g. ceph mounts on Arcus, nfs(?) mounts on BSC
-
-Manual stages:
-Install packer, clusterctl, server certificates etc.
-Manually build / test image in target environment, get working combinations of flavours and boot disk sizes. 
-Generate template file, adjust any arguments. 
-Once we've got this far, can automate using the template. 
-Note that we can't check templates into a repo, as they contain security information
-
-Automated stages:
-
-kubectl apply template file
-clusterctl describe until ready
-get kubeconfig file
-apply calico
-use openstack to lookup network id for new network (how do we get cluster name? from environment variable?)
-build application secret conf file 
-build secret in target environment
-complete setup
-install cinder storage classes
+At this point our new cluster should be ready to accept kubernetes services in the normal fashion, using the KUBECONFIG file that was generated during the cluster creation.
 
-do site-specific post-installation:
-get list of worker names via kubectl get nodes
-install ceph client on each worker node
-configure ceph on each worker node
-- mount ceph shares on Arcus. need list of shares to mount, lookup keys and create share mount on each worker VM 
-- attach shared volumes on Somerville, BSC? )
-- modify /etc/fstab rather than configuring from directory?
 
-Things to try:
-Automatic configuration of ceph network on arcus
-attach manila shares to pod instead of using ceph mounts (wont be available at every site)
 
-Generic scripts:
 
-lookup network id, build conf file
-lookup keys for ceph shares
-install list of ceph shares on VMs
-get list of worker node names and ip addresses
 
 
 

From 1bdc69aa4db237478ca4b3f4b55ef88e4b4675fb Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Wed, 22 Jan 2025 11:25:54 +0000
Subject: [PATCH 11/12] Added notes for manila test

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 .../20250122-manila-test.txt                  | 102 ++++++++++++++++++
 1 file changed, 102 insertions(+)
 create mode 100644 notes/millingw/ClusterAPIScripts/20250122-manila-test.txt

diff --git a/notes/millingw/ClusterAPIScripts/20250122-manila-test.txt b/notes/millingw/ClusterAPIScripts/20250122-manila-test.txt
new file mode 100644
index 00000000..56a8ef85
--- /dev/null
+++ b/notes/millingw/ClusterAPIScripts/20250122-manila-test.txt
@@ -0,0 +1,102 @@
+# test access to cephfs service
+We should be able to access ceph shares directly in a pod.  
+However, as of 2025-01-22 this wasn't working!
+
+In Horizon GUI, manually create a share. Create a cephx access rule, then copy the access key and full storage path  
+
+Create a secret containing the access key
+
+ceph-secret.yaml
+```
+apiVersion: v1
+kind: Secret
+metadata:
+  name: ceph-secret
+stringData:
+  key: ****
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f ceph-secret.yaml
+
+Create a test pod that mounts the ceph share as a volume. The ceph share path needs to be separated into a list of monitor addresses and the relative path, eg
+
+pod.yaml
+
+```
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: test-cephfs-share-pod
+spec:
+  containers:
+    - name: web-server
+      image: nginx
+      imagePullPolicy: IfNotPresent
+      volumeMounts:
+        - name: testpvc
+          mountPath: /var/lib/www
+        - name: cephfs
+          mountPath: "/mnt/cephfs"
+  volumes:
+    - name: testpvc
+      persistentVolumeClaim:
+        claimName: test-cephfs-share-pvc
+        readOnly: false
+    - name: cephfs
+      cephfs:
+        monitors:
+        - 10.4.200.9:6789
+        - 10.4.200.13:6789
+        - 10.4.200.17:6789
+        - 10.4.200.25:6789
+        - 10.4.200.26:6789
+        secretRef:
+          name: ceph-secret
+        readOnly: false
+        path: "/volumes/_nogroup/ca890f73-3e33-4e07-879c-f7ec0f5a8a17/52bcd13b-a358-40f0-9ffa-4334eb1e06ae"
+```
+
+Example uses nginx, so install that:
+
+```
+helm install --kubeconfig=./${CLUSTER_NAME}.kubeconfig nginx bitnami/nginx
+```
+
+deploy the pod
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig apply -f manila-csi-kubespray/pod.yaml
+```
+
+Inspect the pod to verify that the ceph share was successfully mounted
+
+# test jhub deployment, check where user areas get created
+
+deploy jhub, check where user area is created
+
+```
+helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
+helm --kubeconfig=./${CLUSTER_NAME}.kubeconfig upgrade --install jhub jupyterhub/jupyterhub --version=3.3.8
+```
+
+# port forward on control VM
+```
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig --namespace=default port-forward service/proxy-public 8080:http
+```
+
+# port forward on laptop:
+ssh -i "gaia_jade_test_malcolm.pem" -L 8080:127.0.0.1:8080 rocky@192.41.122.174
+browse to 127.0.0.1:8080 and login, eg as user 'hhh'
+
+# on control VM, list pvs/pvcs
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get pv
+NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS           VOLUMEATTRIBUTESCLASS   REASON   AGE                         6h56m
+pvc-8b970f5c-440b-48f8-ae19-4fb35d20e85f   10Gi       RWO            Delete           Bound    default/claim-hhh               csi-manila-cephfs      <unset>           6h51m
+pvc-7d104b45-7efe-4250-b9fe-5bf441eb65a9   1Gi        RWO            Delete           Bound    default/hub-db-dir              csi-manila-cephfs      <unset>
+
+kubectl --kubeconfig=./${CLUSTER_NAME}.kubeconfig get pvc
+NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           VOLUMEATTRIBUTESCLASS   AGE
+claim-hhh               Bound    pvc-8b970f5c-440b-48f8-ae19-4fb35d20e85f   10Gi       RWO            csi-manila-cephfs      <unset>                 6h52m
+hub-db-dir              Bound    pvc-7d104b45-7efe-4250-b9fe-5bf441eb65a9   1Gi        RWO            csi-manila-cephfs      <unset>                 6h58m
+
+
+

From 531ffd8309479715c8c669de1d518dd3e83a43ad Mon Sep 17 00:00:00 2001
From: millingw <13414895+millingw@users.noreply.github.com>
Date: Wed, 22 Jan 2025 11:28:12 +0000
Subject: [PATCH 12/12] Update Readme.MD

Signed-off-by: millingw <13414895+millingw@users.noreply.github.com>
---
 notes/millingw/ClusterAPIScripts/Readme.MD | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/notes/millingw/ClusterAPIScripts/Readme.MD b/notes/millingw/ClusterAPIScripts/Readme.MD
index 5406db82..a11aafbe 100644
--- a/notes/millingw/ClusterAPIScripts/Readme.MD
+++ b/notes/millingw/ClusterAPIScripts/Readme.MD
@@ -38,6 +38,8 @@ Cluster creation can be monitored with clusterctl, ie clusterctl describe cluste
 
 Note that a cluster may be ready for use before all workers are ready; the script may loop indefinitely if the target project can't provide the requested number of workers.
 
+On successfull completion of the script, a KUBECONFIG file should be output that can be used to install services on the newly created cluster.
+
 The resulting cluster and KUBECONFIG file can then be used to install kubernetes services in the usual fashion.
   
 The intention is to maintain a set of production scripts for each deployment site, with a separate master configuration file for each site to be sourced by the build script.