-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for excluding/including nodes from external load balancers before/after reboot #378
Conversation
@evrardjp I added a |
@evrardjp All checks have passed. What is the next step? |
Hello! I will have a look :) |
… before/after restart, added delay to wait for de-registration to finish before rebooting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be better if you could split better your commits, and explain WHY you're doing those changes, in your commit messages.
Here is a good video on commit messages: https://www.youtube.com/watch?v=pU-VasVPNAs
drain(client, node) | ||
|
||
if rebootDelay > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next to that, I would probably use the existing throttle function, maybe rename and refactor it so that it's clearer.
I am worried the users might not understand the difference between all those options we have, tbh, so I think it might be worth clarifying in the CLI arguments.
@@ -40,6 +40,7 @@ var ( | |||
// Command line flags | |||
forceReboot bool | |||
drainTimeout time.Duration | |||
rebootDelay time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best toremove this from this PR, as it's effectively not linked to the PR title.
Maybe we need to fix the current ELB feature to not require the addition of this delay. This way, the intent of the rebootDelay would would be more clear.
If we don't add this feature in a different PR, we should at least make the boundary very clear by having only 2 commits: One for ELB support, one for the reboot-delay. (You will need to squash your changes appropriately).
@@ -104,6 +114,8 @@ func main() { | |||
"when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)") | |||
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0, | |||
"timeout after which the drain is aborted (default: 0, infinite time)") | |||
rootCmd.PersistentFlags().DurationVar(&rebootDelay, "reboot-delay", 0, | |||
"delay reboot for this duration (default: 0, disabled)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"delay reboot for this duration after drain and exclusion from any ELB"? I feel this will be more understandable by someone not reading the code.
@@ -489,6 +547,7 @@ func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []s | |||
if err != nil { | |||
log.Fatalf("Error retrieving node object via k8s API: %v", err) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not add those spaces changes in this commit. If you want to change the formatting to look better for you, please do it in a different commit/PR.
It's a nice contribution, I like it! I can totally see it making its way to the code base. We can improve the support to other cloud vendors in the future. Thank you! |
Thanks for taking a look!
Using the "node.kubernetes.io/exclude-from-external-load-balancers" node label offloads external load balancer de-registration to the K8S control plane so the code should already be able to handle other cloud vendors.
As far as I know, there isn't a way to retrieve the external load balancer status from the K8S API which means that to delay the restart automatically would require writing vendor-specific code. Adding a |
This PR enables nodes that are registered to external load balancers to be restarted gracefully by removing them from the load balancers before reboot and adding them back after. This is accomplished by adding the "node.kubernetes.io/exclude-from-external-load-balancers" label to the node before reboot which triggers de-registration by the kubernetes control plane. Then after reboot the node label is removed which triggers re-registration by the control plane. In addition, a hard-coded reboot delay of 105 seconds is added to allow time for node instances to finish de-registration.
This functionality was added in a bare-bones fashion and should not be merged as-is. Here are some questions:
Patch()
?Thanks for your consideration! Looking forward to hearing your feedback!