It is impossible to enumerate all failure scenarios, much less protect from all of them. However, on this page we try to analyze the impact of various components malfunctioning.
"Core control plane" refers to the common dependency of all control plane components, namely kube-apiserver and etcd.
Cluster | Scenario | Affected objects | What happens without Podseidon | What happens with Podseidon |
---|---|---|---|---|
Core | Data disappearance (e.g. due to etcd data corruption or buggy controllers) | Source workload only | ❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods. | ✅ PodProtector is not cascade-deleted due to lack of explicit deletionTimestamp. Cascade deletion of underlying pods is rejected by the Podseidon webhook. |
PodProtector only | N/A |
|
||
PodProtector + source workload/intermediate objects | ❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods. | ❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods. Podseidon webhook is unable to protect the pods if kube-apiserver sent the deletion event to its informer. | ||
Other dependency objects |
|
|||
Loss of strong consistency | PodProtector | N/A |
|
|
Worker | Data disappearance (e.g. due to etcd data corruption or buggy controllers) | Pod | ❌ Kubelet will kill pods without warning. This cannot be mitigated without modifying kubelet code. | |
Intermediate objects (e.g. ReplicaSet) | ❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods. | ✅ PodProtector is not cascade-deleted due to lack of explicit deletionTimestamp. Cascade deletion of underlying pods is rejected by the Podseidon webhook. | ||
Podseidon ValidatingWebhookConfiguration | N/A |
|
||
Significant watch cache lag (but any available watch events are still delivered in order) | Pod → Podseidon Aggregator | N/A |
If `--aggregator-informer-synctime-algorithm=clock`, this may result in incorrect approval of pod deletion due to the lag between PodProtector admission and event reception. This issue does not happen if `status` is used instead. |
|
Loss of strong consistency | Pod → Podseidon Aggregator watch | N/A |
|
Component | Scenario | Consequence |
---|---|---|
Generator | Not working |
Incorrect rejection after scaling down. |
Incorrect logic | ||
Aggregator | Not working |
New available pods are not observed in aggregation. Both may disrupt normal operations due to incorrect rejections from Podseidon webhook. |
Incorrect logic |
|
|
Webhook | Unavailable |
|
Incorrect logic |
|
✅
Disruptions to the chain between the source of truth (main workload) and pods
shall not result in service disruption beyond the level permitted by maxUnavailable
.