Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator v2: tracking features #334

Closed
20 of 25 tasks
WanzenBug opened this issue Sep 7, 2022 · 11 comments
Closed
20 of 25 tasks

Operator v2: tracking features #334

WanzenBug opened this issue Sep 7, 2022 · 11 comments
Assignees
Milestone

Comments

@WanzenBug
Copy link
Member

WanzenBug commented Sep 7, 2022

We've recently started work on Operator v2

This is intended as a list of features that need to be ported from v1, or features we want to add in v2:


Note: this list is not complete. If there is something to be added, please comment below

@WanzenBug WanzenBug added this to the 2.0 milestone Sep 7, 2022
@WanzenBug WanzenBug self-assigned this Sep 7, 2022
@phoenix-bjoern
Copy link
Contributor

@WanzenBug It would be a great chance to add a migration script to the the new K8S backend and make it mandatory for the operator V2 :-)
OFC I can also create a separate issue if you prefer to track that independently.

@bc185174
Copy link
Contributor

Consider using Kustomize as the default deployment tool. Allows for greater control and from a maintainability point of view its simpler to patch resources instead of templating them.

Good example of where this is used is https://github.com/kubernetes-sigs/node-feature-discovery

@WanzenBug
Copy link
Member Author

Consider using Kustomize as the default deployment tool

On that front, I can report that the v2 branch is already using kustomize, both as a way to deploy the operator and as a way to customize the actually deployed resources. Basically, you can attach kustomize patches to the resources managed by the operator.

We are still thinking about adding some form of Helm chart, since a lot of users are still used to that.

@WanzenBug WanzenBug pinned this issue Oct 19, 2022
@bc185174
Copy link
Contributor

Along with the registry, could we configure the image pull secrets, pull policy and image tag? Makes it simpler for end-users to automate upgrading the application. For instance, using yq to adjust the image tag in a Makefile. More than happy to help contribute as always.

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
  imageSource: 
    repository: registry.example.com/piraeus
    tag: v1.10.0
    pullPolicy: IfNotPresent
    pullSecrets:
    - "SecretName"

@robinbraemer
Copy link

robinbraemer commented Jan 15, 2023

Nice work on v2 so far!
There is a problem with the drbd-module-loader init-container and the lvm mounts on Talos nodes.
On Talos operating system we use extension to add things like kernel modules and the directory structure is a little bit different.

I've deployed the operator using kustomize from the config/default directory and have created the following CRDs:

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstor-cluster
spec:
  nodeSelector:
    node-role.kubernetes.io/linstor: ""
---
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: all-satellites
spec:
  storagePools:
    - name: fs1
      filePool: {}
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: simple-fs
parameters:
  csi.storage.k8s.io/fstype: xfs
#  linstor.csi.linbit.com/autoPlace: "3" # not sure what this does = replica?
  linstor.csi.linbit.com/storagePool: fs1
provisioner: linstor.csi.linbit.com
volumeBindingMode: WaitForFirstConsumer

The drbd-module-loader init-container of the "node pod" tries to hostPath mount /usr/lib/modules, which does not exist on Talos making kubelet error:

MountVolume.SetUp failed for volume "usr-lib-modules" : hostPath type check failed: /usr/lib/modules is not a directory

After manually removing the init-container and building my own image of the operator I found we also can't mount hostPath /etc/lvm/...?:

spec: failed to generate spec: failed to mkdir "/etc/lvm/archive": mkdir /etc/lvm/archive: read-only file system

Only the drbd-reactor container can start.

Since I only want to use file backed storage pools I removed all lvm mounts from the linstor-satellite container and rebuild my operator image.
Now the "node pod" starts successfully and PVs work!

15:10:27.686 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Controller connected and authenticated (10.0.4.220:51996)
15:10:27.894 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Node 'oracle' created.
15:10:27.899 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Storage pool 'DfltDisklessStorPool' created.
15:10:27.970 [DeviceManager] INFO  LINSTOR/Satellite - SYSTEM - Removing all res files from /var/lib/linstor.d
15:10:27.972 [DeviceManager] WARN  LINSTOR/Satellite - SYSTEM - Not calling 'systemd-notify' as NOTIFY_SOCKET is null
15:10:30.600 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Storage pool 'fs1' created.
15:21:39.079 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-963ad95c-0812-46d2-9105-adf0d3812558' created for node 'oracle'.
15:21:39.624 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Primary Resource pvc-963ad95c-0812-46d2-9105-adf0d3812558
15:21:39.624 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Primary bool set on Resource pvc-963ad95c-0812-46d2-9105-adf0d3812558
15:21:39.689 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-963ad95c-0812-46d2-9105-adf0d3812558' updated for node 'oracle'.
15:21:39.851 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-963ad95c-0812-46d2-9105-adf0d3812558' updated for node 'oracle'.
15:21:41.011 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-963ad95c-0812-46d2-9105-adf0d3812558' updated for node 'oracle'.
$ talosctl -n 100.64.6.90 list /var/lib/linstor-pools
NODE          NAME
100.64.6.90   .
100.64.6.90   fs1

$ talosctl -n 100.64.6.90 list /var/lib/linstor-pools/fs1
NODE          NAME
100.64.6.90   .
100.64.6.90   pvc-963ad95c-0812-46d2-9105-adf0d3812558_00000.img

FYI: On Talos /usr/lib/ contains the following:

$ talosctl -n 100.64.6.90 list /usr/lib/
NODE          NAME
100.64.6.90   .
100.64.6.90   cryptsetup
100.64.6.90   engines-1.1
100.64.6.90   libaio.so
....many .so files
100.64.6.90   udev
100.64.6.90   xfsprogs

$ talosctl -n 100.64.6.90 list /lib/modules
NODE          NAME
100.64.6.90   .
100.64.6.90   5.15.86-talos

I can confirm I've installed the drbd and drbd_transport_tcp kernel modules from the Talos drbd extension:

$ talosctl -n 100.64.6.90 read /proc/drbd
version: 9.2.0 (api:2/proto:86-121)
GIT-hash: 71e60591f3d7ea05034bccef8ae362c17e6aa4d1 build by @buildkitsandbox, 2023-01-11 12:22:06
Transports (api:18): tcp (9.2.0)

Afterthoughts (I'm not an expert)

The /usr/lib/modules directory is typically where the Linux kernel stores the loadable modules (or drivers) that can be loaded into the kernel at runtime. This directory is typically only present on systems that use a monolithic kernel, which includes all the necessary drivers and modules built directly into the kernel.

In contrast, Talos is a kernel that uses a modular design, which means that it loads only the necessary modules at runtime. It does not have a /usr/lib/modules directory as it does not store the kernel modules on disk(?)

Because the script is trying to mount the /usr/lib/modules directory, it will not work as expected on a Talos machine. We may need to modify the script or can remove the init-container, but I think having some init container that runs the recommended modprobe drbd usermode_helper=disabled is very helpful to do on every linstor node.

refs:
There are efforts on documenting how to use piraeus-operator on the Talos website: siderolabs/talos#6426
which will help increase awareness of this great storage project.

@DJAlPee - Got this already working on Talos, but only the main branch piraeus v1 version.
@cf-sewe @frezbo @smira - might be able to help support piraeus on this topic

Talos Slack about getting piraeus-operator v1 to work

@frezbo
Copy link

frezbo commented Jan 16, 2023

just a note, loading modules on talos is disabled, since talos is a configuration driven os, module loading and it's parameters are specified in the machine config, so I guess having an option to disable the init container makes more sense

@WanzenBug
Copy link
Member Author

This sounds like the exact use-case we now have patches for:

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: no-loader
spec:
  patches:
    - target:
        kind: Pod
        name: satellite
      patch: |
        apiVersion: v1
        kind: Pod
        metadata:
          name: satellite
        spec:
          initContainers:
          - name: drbd-module-loader
            $patch: delete

This disables the init container on all nodes.

@DJAlPee
Copy link

DJAlPee commented Jan 17, 2023

This seems to be a pretty nice approach!
In v1 I used operator.satelliteSet.kernelModuleInjectionMode=None in helm, which seems to have the same effect, but is deprecated (But why?).

@WanzenBug
Copy link
Member Author

but is deprecated (But why?).

Because you almost always want to use DepsOnly. I don't know if this would work on Talos, but it should ensure that all the other "useful" mods are loaded if they are available. So dm-thin, dm-crypt, etc...

@DJAlPee
Copy link

DJAlPee commented Jan 17, 2023

Because you almost always want to use DepsOnly. [...]

As @frezbo stated, module loading is disabled in Talos. So we have the "almost" case here 😉
I will update the documentation draft to use None, when using the v1 operator. I hope, you keep this functionality for v1 and remove it only in v2 😉

@WanzenBug
Copy link
Member Author

Operator v2 is released.

@WanzenBug WanzenBug unpinned this issue Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants