Coverage analysis

It is impossible to enumerate all failure scenarios, much less protect from all of them. However, on this page we try to analyze the impact of various components malfunctioning.

Core control plane

"Core control plane" refers to the common dependency of all control plane components, namely kube-apiserver and etcd.

Cluster	Scenario	Affected objects	What happens without Podseidon	What happens with Podseidon
Core	Data disappearance (e.g. due to etcd data corruption or buggy controllers)	Source workload only	❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods.	✅ PodProtector is not cascade-deleted due to lack of explicit deletionTimestamp. Cascade deletion of underlying pods is rejected by the Podseidon webhook.
		PodProtector only	N/A	⚠️ Webhooks can no longer reject pod deletion, but controllers will not actively try to delete the pods since the normal path is not affected.
		PodProtector + source workload/intermediate objects	❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods.	❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods. Podseidon webhook is unable to protect the pods if kube-apiserver sent the deletion event to its informer.
		Other dependency objects	⚠️ No direct impact to running pods, but recreated pods cannot start correctly.
	Loss of strong consistency	PodProtector	N/A	⚠️ No direct impact to normal operations, but webhook may be incorrectly allow pod deletion if apiserver returns 200 OK to conflicting PodProtector status updates.
Worker	Data disappearance (e.g. due to etcd data corruption or buggy controllers)	Pod	❌ Kubelet will kill pods without warning. This cannot be mitigated without modifying kubelet code.
		Intermediate objects (e.g. ReplicaSet)	❌ GC controller (or equivalent cascade deletion controllers) would cascade-delete all pods.	✅ PodProtector is not cascade-deleted due to lack of explicit deletionTimestamp. Cascade deletion of underlying pods is rejected by the Podseidon webhook.
		Podseidon ValidatingWebhookConfiguration	N/A	⚠️ kube-apiserver no longer calls Podseidon webhook, so protection is lost. Such data disappearance is often correlated with mass pod disappearance as well, so the pod count drops immediately, and the ReplicaSet controller is unlikely to try to delete pods at the same time.
	Significant watch cache lag (but any available watch events are still delivered in order)	Pod → Podseidon Aggregator	N/A	⚠️ Normal operations (such as scaling and eviction) may be disrupted due to Podseidon webhook not observing new pods becoming available thus resetting the quota for pod deletion. If `--aggregator-informer-synctime-algorithm=clock`, this may result in incorrect approval of pod deletion due to the lag between PodProtector admission and event reception. This issue does not happen if `status` is used instead.
	Loss of strong consistency	Pod → Podseidon Aggregator watch	N/A	⚠️ Aggregator incorrectly invalidates old `admissionHistory` entries, which have not been observed in the current view of pod list yet. The resultant `estimatedAvailableReplicas` is greater than actual, resulting in incorrect approval of pod deletion.

Podseidon components

Component	Scenario	Consequence
Generator	Not working	⚠️ Insufficient protection after scaling up. Incorrect rejection after scaling down.
Generator	Incorrect logic
Aggregator	Not working	⚠️ False positives in admission history are not cleared in time. New available pods are not observed in aggregation. Both may disrupt normal operations due to incorrect rejections from Podseidon webhook.
Aggregator	Incorrect logic	⚠️ Admission history may be incorrectly cleared or preserved, or aggregated replica count may be too large or too little, resulting in incorrect approval or rejection from Podseidon webhook respectively.
Webhook	Unavailable	⚠️ Pods will be denied from deletion if `failurePolicy` is set to `Fail` and all instances are unavailable, disrupting normal operations.
Webhook	Incorrect logic	⚠️ Webhook may incorrectly approve or reject pod deletions.

Other components

✅ Disruptions to the chain between the source of truth (main workload) and pods shall not result in service disruption beyond the level permitted by maxUnavailable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coverage-analysis.md

coverage-analysis.md

Coverage analysis

Core control plane

Podseidon components

Other components

Files

coverage-analysis.md

Latest commit

History

coverage-analysis.md

File metadata and controls

Coverage analysis

Core control plane

Podseidon components

Other components