From bfcf533cd633e054f5d7cfd9eadc1c1f4086319a Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Tue, 17 Dec 2024 19:31:07 +0400 Subject: [PATCH] Create a design-document for the controller (#181) # Motivation I started some "R'n'D" (scare quotes intended) for implementing scale up, scale down, self-healing and so on and quickly realized, that the coding of the member add/member remove and similar steps is the more trivial part of the undertaking. The difficult part is coming up with a working algorithm that can correctly deduce the cluster's state and execute the necessary actions at the right time. To better reason about the controller's algorithm now, and to better develop it going forward, I feel it is important to have good documentation of the current design and the intended next steps, so I started with trying to document the current state of the code. # Results This document contains a mermaid flowchart that outlines the reconciliation loop. It is better viewed in [rendered form](https://github.com/aenix-io/etcd-operator/blob/docs/design/docs/DESIGN.md). Going forward, I envision this document to have at least three purposes: * Let the developers spot flaws and prompt them to open issues. * Act as a more detailed form of documentation for advanced users. * Be a blueprint for implementing anything non-trivial. ## Summary by CodeRabbit - **Documentation** - Updated the design document for the `EtcdCluster` custom resources with a detailed flowchart illustrating the reconciliation process and lifecycle management within a Kubernetes environment. --------- Co-authored-by: Hidden Marten Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --- docs/DESIGN.md | 81 +++++++++++++++++++++++++++++++++++++++++++++++ docs/sts-flow.svg | 4 +++ 2 files changed, 85 insertions(+) create mode 100644 docs/DESIGN.md create mode 100644 docs/sts-flow.svg diff --git a/docs/DESIGN.md b/docs/DESIGN.md new file mode 100644 index 00000000..6aba5df3 --- /dev/null +++ b/docs/DESIGN.md @@ -0,0 +1,81 @@ +# Design + +This document describes the interaction between `EtcdCluster` custom resources and other Kubernetes +primitives and gives an overview of the underlying implementation. + +## Reconciliation flowchart + +```mermaid +flowchart TD + Start(Start) --> A[Ensure service.] + A --> AA{Are there any\nendpoints?} + AA --> |Yes| AAA[Connect to the cluster\nand fetch all statuses.] + AAA --> |Got some response| AAAA{All reachable\nmembers have the\nsame cluster ID?} + AAAA --> |Yes| AAAAA{Is cluster\nin quorum?} + AAAAA --> |Yes| AAAAAA{Are all members \nmanaged by the operator?} + AAAAAA --> |Yes| AAAAAAA["` + Promote any learners. + Ensure configmap with initial cluster matching existing members and cluster state=existing. + Ensure StatefulSet with replicas = max member ordinal + 1 + `"] + AAAAAAA --> |OK| AAAAAAAA{Are all\nmembers healthy?} + AAAAAAAA --> |Yes| AAAAAAAAA{Are all STS pods present\nin the member list?} + AAAAAAAAA --> |Yes| AAAAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAAAAAAA -->|Yes| AAAAAAAAAAA[Set cluster\nstatus to ready.] + AAAAAAAAAAA --> HappyStop([Stop]) + + AAAAAAAAAA --> |No, desired\nsize larger| AAAAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] + AAAAAAAAAAB --> ScaleUpStop([Stop]) + + AAAAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] + AAAAAAAAAAC --> ScaleDownStop([Stop]) + + AAAAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAAAD[Decrement\nSTS to zero] + AAAAAAAAAAD --> ScaleToZeroStop([Stop]) + + AAAAAAAA --> |No| AAAAAAAAB1[On timeout evict member.] + AAAAAAAAB1 --> AAAAAAAAB2[Delete PVC, ensure ConfigMap with\nmembers + this one and delete pod.] + + AAAAAAAAA --> |No| AAAAAAAAB2 + + AAAAAAA -->|Error| AAAAAAAB([Requeue]) + + AAAAAA --> |No| AAAAAAB([Not implemented,\nstop.]) + + AAAAA --> |No| AAAAAB([Quorum Loss Detected: + 1. Check for temporary issues: + - Network partitions + - Pod scheduling problems + 2. If temporary, wait for recovery + 3. If permanent: + - Alert operators + - Document disaster recovery steps + - Consider backup restoration]) + + AAAA --> |No| AAAAB[Cluster is in\nsplit-brain. Set\nerror status.] + AAAAB --> AAAABStop([Stop]) + + AAA --> |No members\nreached| AAAB{Is the STS\npresent?} + AAAB --> |Yes| AAABA{"`Does it have the correct pod spec?`"} + AAABA --> |Yes| AAABAA(["`The statefulset cannot be ready, as the ready and liveness probes must be failing. Hope it becomes ready or wait for user intervention.`"]) + AAABA --> |No| AAABAB["`Patch the podspec`"] + + AAAB --> |No| AAABB(["`Looks like it was deleted with cascade=orphan. Create it again and see what happens`"]) + + AA --> |No| AAB{Is the STS\npresent?} + AAB --> |Yes| AABA{Does it have the\ncorrect pod spec?} + AABA --> |Yes| AABAA{Is it\nready?} + AABAA --> |Yes| AABAAA{Then it must have\nspec.replicas==0\n Is EtcdCluster\n.spec.replicas==0?} + AABAAA --> |Yes| AABAAAA([Cluster successfully\nscaled to zero, stop.]) + AABAAA --> |No| AABAAAB["` + Ensure ConfigMap with initial cluster = new, + initial cluster peers with single member name-0, + increment STS size. + `"] + + AABAA --> |No| AABAAB([Stop and wait, either\nit will turn ready soon\nand the next reconcile\nwill move things along,\nor user intervention is\nneeded]) + + AABA --> |No| AABAB[Patch the podspec] + + AAB --> |No| AABB[Create configmap, initial state new\ninitial cluster according to spec.\nreplicas, create statefulset.] +``` diff --git a/docs/sts-flow.svg b/docs/sts-flow.svg new file mode 100644 index 00000000..aea244ba --- /dev/null +++ b/docs/sts-flow.svg @@ -0,0 +1,4 @@ + + + +
No
No
Yes
Yes
Does STS exist?
Does STS exist?
Error
Error
Ok
Ok
Create STS
Create STS
Requeue
Requeue
Stop
Stop
Yes
Yes
No
No
Is
.spec.replicas==0
in existing STS?
Is...
Yes
Yes
No
No
Is
.spec.replicas==0
CR?
Is...
Requeue
Requeue
Error
Error
Yes
Yes
Ensure ConfigMap with
initial cluster = new,
initial cluster peers with
single member `name`-0
Ensure ConfigMap with...
Requeue
Requeue
Error
Error
Yes
Yes
Update STS
Update STS
Stop
Stop
Text is not SVG - cannot display