You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, I want to share a story about object deletion in kubernetes federation. This leads to a better understanding on how object deletion works in kubernetes, and I believe this may help others to understand as well.
It all starts with a support request:
When I delete my namespace on federation, the same namespace in member cluster is not deleted, however, cascade deletion does happen for all other resources like configmaps.
Before we jump in, we'll also see how the object deletion on federation is supposed to work firstly. I'll create a new namespace on federation below.
Finalizer is the mechanism to ensure proper cleanup in kubernetes, the object won't disappear from storage as long as there are finalizers remaining. In another word, every finalizer has its own responsibility for certain resource cleanup. What is the responsibility of these two finalizers then? Let's take a look at the code where these two finalizers are added.
// Ensures that the given object has both FinalizerDeleteFromUnderlyingClusters// and FinalizerOrphan finalizers.// We do this so that the controller is always notified when a federation resource is deleted.// If user deletes the resource with nil DeleteOptions or// DeletionOptions.OrphanDependents = true then the apiserver removes the orphan finalizer// and deletion helper does a cascading deletion.// Otherwise, deletion helper just removes the federation resource and orphans// the corresponding resources in underlying clusters.// This method should be called before creating objects in underlying clusters.func (dh*DeletionHelper) EnsureFinalizers(obj runtime.Object) (
runtime.Object, error) {
finalizers:= sets.String{}
hasFinalizer, err:=finalizersutil.HasFinalizer(obj, FinalizerDeleteFromUnderlyingClusters)
iferr!=nil {
returnobj, err
}
if!hasFinalizer {
finalizers.Insert(FinalizerDeleteFromUnderlyingClusters)
}
hasFinalizer, err=finalizersutil.HasFinalizer(obj, metav1.FinalizerOrphanDependents)
iferr!=nil {
returnobj, err
}
if!hasFinalizer {
finalizers.Insert(metav1.FinalizerOrphanDependents)
}
iffinalizers.Len() !=0 {
glog.V(2).Infof("Adding finalizers %v to %s", finalizers.List(), dh.objNameFunc(obj))
returndh.addFinalizers(obj, finalizers)
}
returnobj, nil
}
To summarize, the finalizer federation.kubernetes.io/delete-from-underlying-clusters ensures a proper cleanup in member clusters, aka cascade deletion. However, we do not always want to do cascade deletion, in some cases, we just want the object to be deleted from federation while leaving the objects in member clusters intact.
Now the questions is: how does the controller known whether the object needs a cascade deletion or not? This is where finalizer orphan plays the role. If the finalizer orphan remains, it means no cascade deletion.
We can also look into the code where the controller removes finalizers:
// Deletes the resources corresponding to the given federated resource from// all underlying clusters, unless it has the FinalizerOrphan finalizer.// Removes FinalizerOrphan and FinalizerDeleteFromUnderlyingClusters finalizers// when done.// Callers are expected to keep calling this (with appropriate backoff) until// it succeeds.func (dh*DeletionHelper) HandleObjectInUnderlyingClusters(obj runtime.Object) (
...hasFinalizer, err:= finalizersutil.HasFinalizer(obj, FinalizerDeleteFromUnderlyingClusters)
iferr!=nil {
returnobj, err
}
if!hasFinalizer {
glog.V(2).Infof("obj does not have %s finalizer. Nothing to do", FinalizerDeleteFromUnderlyingClusters)
returnobj, nil
}
hasOrphanFinalizer, err:=finalizersutil.HasFinalizer(obj, metav1.FinalizerOrphanDependents)
iferr!=nil {
returnobj, err
}
ifhasOrphanFinalizer {
glog.V(2).Infof("Found finalizer orphan. Nothing to do, just remove the finalizer")
// If the obj has FinalizerOrphan finalizer, then we need to orphan the// corresponding objects in underlying clusters.// Just remove both the finalizers in that case.finalizers:=sets.NewString(FinalizerDeleteFromUnderlyingClusters, metav1.FinalizerOrphanDependents)
returndh.removeFinalizers(obj, finalizers)
}
We can see, the controller removes both orphan and federation.kubernetes.io/delete-from-underlying-clusters finalizer at the same time if orphan finalizer is still on that object.
You may ask, why the controller adds these two finalizers at the first time, does that mean it will always do non-cascade deletion? The logic seems to be weird, but actually it is not, just keep reading..
Let's try to convince ourselves first, if we want to do cascade deletion, then someone must remove the orphan finalizer before the controller works on that deletion. That has to happen, and must always happen earlier than the time when controller gets the deletion event.
This proves to be true, when we send a delete request to api server, the registry implementation for namespace decides what to do on those finalizers:
// Delete enforces life-cycle rules for namespace terminationfunc (r*REST) Delete(ctx context.Context, namestring, options*metav1.DeleteOptions) (runtime.Object, bool, error) {
...// upon first request to delete, we switch the phase to start namespace termination// TODO: enhance graceful deletion's calls to DeleteStrategy to allow phase change and finalizer patternsifnamespace.DeletionTimestamp.IsZero() {
err=r.store.Storage.GuaranteedUpdate(
ctx, key, out, false, &preconditions,
storage.SimpleUpdate(func(existing runtime.Object) (runtime.Object, error) {
...// Remove orphan finalizer if options.OrphanDependents = false.ifoptions.OrphanDependents!=nil&&*options.OrphanDependents==false {
// remove Orphan finalizer.newFinalizers:= []string{}
fori:=rangeexistingNamespace.ObjectMeta.Finalizers {
finalizer:=existingNamespace.ObjectMeta.Finalizers[i]
ifstring(finalizer) !=metav1.FinalizerOrphanDependents {
newFinalizers=append(newFinalizers, finalizer)
}
}
existingNamespace.ObjectMeta.Finalizers=newFinalizers
}
returnexistingNamespace, nil...
Bingo, now we can see, the namespace store removes orphan finalizer when *options.OrphanDependents == false. This can explain how the cascade deletion works on federation:
User sends a delete request via kubectl delete.
APIServer removes the orphan finalizer when options.orphanDependents == false.
Controller then deletes objects in member clusters and removes federation.kubernetes.io/delete-from-underlying-clusters once all the deletions are successfully made.
If we do not want cascade deletion, we must unset options.orphanDependents or set it as true. And in that case, the orphan finalizer remains, and the controller will remove both finalizers without touching member clusters.
Now let's get back to our problem, why it is broken:
As observed, the delete options is {"propagationPolicy":"Background"}, this is the default value on kubectl v1.14.0. Since it does not specifying orphanDependents, the finalizer orphan is not removed, and thus the cascade deletion does not happen.
But why other resources like configmaps are still fine, do they have a different registry store implementation? Well, they do!
// Delete removes the item from storage.func (e*Store) Delete(ctx context.Context, namestring, options*metav1.DeleteOptions) (runtime.Object, bool, error) {
...// Handle combinations of graceful deletion and finalization by issuing// the correct updates.shouldUpdateFinalizers, _:=deletionFinalizersForGarbageCollection(ctx, e, accessor, options)
// TODO: remove the check, because we support no-op updates now.ifgraceful||pendingFinalizers||shouldUpdateFinalizers {
err, ignoreNotFound, deleteImmediately, out, lastExisting=e.updateForGracefulDeletionAndFinalizers(ctx, name, key, options, preconditions, obj)
}
...
}
// deletionFinalizersForGarbageCollection analyzes the object and delete options// to determine whether the object is in need of finalization by the garbage// collector. If so, returns the set of deletion finalizers to apply and a bool// indicating whether the finalizer list has changed and is in need of updating.//// The finalizers returned are intended to be handled by the garbage collector.// If garbage collection is disabled for the store, this function returns false// to ensure finalizers aren't set which will never be cleared.funcdeletionFinalizersForGarbageCollection(ctx context.Context, e*Store, accessor metav1.Object, options*metav1.DeleteOptions) (bool, []string) {
...shouldOrphan:=shouldOrphanDependents(ctx, e, accessor, options)
...newFinalizers:= []string{}
// first remove both finalizers, add them back if needed.for_, f:=rangeaccessor.GetFinalizers() {
iff==metav1.FinalizerOrphanDependents||f==metav1.FinalizerDeleteDependents {
continue
}
newFinalizers=append(newFinalizers, f)
}
ifshouldOrphan {
newFinalizers=append(newFinalizers, metav1.FinalizerOrphanDependents)
}
...
}
// shouldOrphanDependents returns true if the finalizer for orphaning should be set// updated for FinalizerOrphanDependents. In the order of highest to lowest// priority, there are three factors affect whether to add/remove the// FinalizerOrphanDependents: options, existing finalizers of the object,// and e.DeleteStrategy.DefaultGarbageCollectionPolicy.funcshouldOrphanDependents(ctx context.Context, e*Store, accessor metav1.Object, options*metav1.DeleteOptions) bool {
// Get default GC policy from this REST object typegcStrategy, ok:=e.DeleteStrategy.(rest.GarbageCollectionDeleteStrategy)
vardefaultGCPolicy rest.GarbageCollectionPolicyifok {
defaultGCPolicy=gcStrategy.DefaultGarbageCollectionPolicy(ctx)
}
ifdefaultGCPolicy==rest.Unsupported {
// return false to indicate that we should NOT orphanreturnfalse
}
// An explicit policy was set at deletion time, that overrides everythingifoptions!=nil&&options.OrphanDependents!=nil {
return*options.OrphanDependents
}
ifoptions!=nil&&options.PropagationPolicy!=nil {
switch*options.PropagationPolicy {
casemetav1.DeletePropagationOrphan:
returntruecasemetav1.DeletePropagationBackground, metav1.DeletePropagationForeground:
returnfalse
}
}
// If a finalizer is set in the object, it overrides the default// validation should make sure the two cases won't be true at the same time.finalizers:=accessor.GetFinalizers()
for_, f:=rangefinalizers {
switchf {
casemetav1.FinalizerOrphanDependents:
returntruecasemetav1.FinalizerDeleteDependents:
returnfalse
}
}
// Get default orphan policy from this REST object type if it existsifdefaultGCPolicy==rest.OrphanDependents {
returntrue
}
returnfalse
}
I feel guilty to paste a lot of code above, but I already tried to put as less possible. Put it simply:
It removes orphan finalizer first.
It adds orphan finalizer back if:
options.orphanDependents is true.
or options.propagationPolicy is orphan, otherwise not.
options is nil, and orphan finalizer is present.
the default GC policy in DeleteStrategy is OrphanDependents
Since the client passes delete options as propagationPolicy: background, the orphan finalizer is not added back, which means the controller will do cascade deletion.
Now we have figured out how it happened, it needs a lot of patience to read the code. However it is really enjoyable.
The text was updated successfully, but these errors were encountered:
Note: The issue has been fixed in kubernetes/kubernetes#76051.
Today, I want to share a story about object deletion in kubernetes federation. This leads to a better understanding on how object deletion works in kubernetes, and I believe this may help others to understand as well.
It all starts with a support request:
Before we jump in, we'll also see how the object deletion on federation is supposed to work firstly. I'll create a new namespace on federation below.
If you pay attention enough, you can find that there are two finalizers added:
Finalizer is the mechanism to ensure proper cleanup in kubernetes, the object won't disappear from storage as long as there are finalizers remaining. In another word, every finalizer has its own responsibility for certain resource cleanup. What is the responsibility of these two finalizers then? Let's take a look at the code where these two finalizers are added.
To summarize, the finalizer
federation.kubernetes.io/delete-from-underlying-clusters
ensures a proper cleanup in member clusters, aka cascade deletion. However, we do not always want to do cascade deletion, in some cases, we just want the object to be deleted from federation while leaving the objects in member clusters intact.Now the questions is: how does the controller known whether the object needs a cascade deletion or not? This is where finalizer
orphan
plays the role. If the finalizerorphan
remains, it means no cascade deletion.We can also look into the code where the controller removes finalizers:
We can see, the controller removes both
orphan
andfederation.kubernetes.io/delete-from-underlying-clusters
finalizer at the same time iforphan
finalizer is still on that object.You may ask, why the controller adds these two finalizers at the first time, does that mean it will always do non-cascade deletion? The logic seems to be weird, but actually it is not, just keep reading..
Let's try to convince ourselves first, if we want to do cascade deletion, then someone must remove the
orphan
finalizer before the controller works on that deletion. That has to happen, and must always happen earlier than the time when controller gets the deletion event.This proves to be true, when we send a delete request to api server, the registry implementation for namespace decides what to do on those finalizers:
Bingo, now we can see, the namespace store removes
orphan
finalizer when*options.OrphanDependents == false
. This can explain how the cascade deletion works on federation:kubectl delete
.orphan
finalizer whenoptions.orphanDependents == false
.federation.kubernetes.io/delete-from-underlying-clusters
once all the deletions are successfully made.If we do not want cascade deletion, we must unset
options.orphanDependents
or set it astrue
. And in that case, theorphan
finalizer remains, and the controller will remove both finalizers without touching member clusters.Now let's get back to our problem, why it is broken:
As observed, the delete options is
{"propagationPolicy":"Background"}
, this is the default value on kubectl v1.14.0. Since it does not specifyingorphanDependents
, the finalizerorphan
is not removed, and thus the cascade deletion does not happen.But why other resources like configmaps are still fine, do they have a different registry store implementation? Well, they do!
I feel guilty to paste a lot of code above, but I already tried to put as less possible. Put it simply:
orphan
finalizer first.orphan
finalizer back if:options.orphanDependents
is true.options.propagationPolicy
is orphan, otherwise not.orphan
finalizer is present.OrphanDependents
Since the client passes delete options as
propagationPolicy: background
, theorphan
finalizer is not added back, which means the controller will do cascade deletion.Now we have figured out how it happened, it needs a lot of patience to read the code. However it is really enjoyable.
The text was updated successfully, but these errors were encountered: