-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module to upgrade servers #24971
base: main
Are you sure you want to change the base?
Module to upgrade servers #24971
Conversation
…workloads creation
2cfa1a9
to
13b0878
Compare
36565ca
to
37fe298
Compare
37fe298
to
9ae5318
Compare
6563504
to
c4fe1b3
Compare
d3b3a9a
to
e15864a
Compare
…e to ensure they are upgraded one by one
e15864a
to
140bb64
Compare
enos/enos-scenario-upgrade.hcl
Outdated
Important: The path where the binary will be placed is hardcoded, according | ||
to Nomads best practices it will be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably less "best practices" and more "what our packaging does".
enos/modules/upgrade_servers/main.tf
Outdated
inline = [ | ||
"nomad operator snapshot save ${random_pet.upgrade.id}-0.snap", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the -stale
flag, this is always getting forwarded to the leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me test with the stale
flag and specifying the server address
Co-authored-by: Tim Gross <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
dcaffe9
to
cb32f51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
just a few comments for your consideration, but no blockers.
} | ||
|
||
inline = [ | ||
"sudo systemctl daemon-reload", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't hurt anything, but I don't believe daemon-reload is required unless a systemd unit file is being changed. here only the thing changing is the binary, right? so only need to restart the agent.
// Taking a snapshot forces the cluster to store a new snapshot that will be | ||
// used to restore the cluster after the restart, because it will be the most | ||
// recent available, the resulting file wont be used.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the purpose of the snapshot is just to compact the state store, so that no log replay is necessary when the agent starts up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just "no log replay necessary" but that we force the server not to use the logs, which don't pass thru the FSM restore code path in the same way.
//////////////////////////////////////////////////////////////////////////////// | ||
// Upgrading the second server | ||
//////////////////////////////////////////////////////////////////////////////// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure would be neat if TF gave us a way to serialize resource creation, alas. hashicorp/terraform#10802
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declarative language..blah..blah...
enos/modules/upgrade_servers/main.tf
Outdated
nomad_addr = var.nomad_addr | ||
ca_file = var.ca_file | ||
cert_file = var.cert_file | ||
key_file = var.key_file | ||
nomad_token = var.nomad_token | ||
platform = var.platform | ||
server_address = var.servers[1] | ||
nomad_local_upgrade_binary = var.nomad_upgraded_binary | ||
ssh_key_path = var.ssh_key_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-picky: lots of redundant code around here, which is pretty normal for terraform, but...
I get why these have to be separate module
calls, but the only thing that's changing here is the server_address
, right?
what if all the other variables were a single object, so that a map of them all could be stored in a local
? something like this in upgrade_instance/variables.tf
:
variable "server_address" {
description = "IP address of the server that will be updated"
type = string
}
variable "stuff" {
description = "the rest of the stuff"
type = object({
# this object definition ensures all the required stuff is passed in
nomad_addr = string
ca_file = string
cert_file = string
key_file = string
nomad_token = string
platform = string
nomad_local_upgrade_binary = string
ssh_key_path = string
})
}
and here in upgrade_servers/main.tf
:
locals {
stuff = {
nomad_addr = var.nomad_addr
ca_file = var.ca_file
cert_file = var.cert_file
key_file = var.key_file
nomad_token = var.nomad_token
platform = var.platform
nomad_local_upgrade_binary = var.nomad_upgraded_binary
ssh_key_path = var.ssh_key_path
}
}
module upgrade_second_server {
depends_on = [enos_local_exec.take_second_cluster_snapshot]
source = "../upgrade_instance"
server_address = var.servers[1]
stuff = local.stuff
}
I really like this pattern (sometimes, especially with sub-modules like this) -- like passing in a reusable config struct in golang, instead of a bunch of separate func parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did my best to group things that made sense :)
enos/modules/upgrade_servers/main.tf
Outdated
environment = { | ||
NOMAD_ADDR = var.nomad_addr | ||
NOMAD_CACERT = var.ca_file | ||
NOMAD_CLIENT_CERT = var.cert_file | ||
NOMAD_CLIENT_KEY = var.key_file | ||
NOMAD_TOKEN = var.nomad_token | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these calls could also be shortened with a local env map:
locals {
nomad_env = {
NOMAD_ADDR = var.nomad_addr
NOMAD_CACERT = var.ca_file
NOMAD_CLIENT_CERT = var.cert_file
NOMAD_CLIENT_KEY = var.key_file
NOMAD_TOKEN = var.nomad_token
}
}
resource "enos_local_exec" "second_leader_verification" {
depends_on = [module.upgrade_second_server]
environment = local.nomad_env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im new to terraform, I will take a look :) Thank you!
…s not meant for dynamic files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
/* variable "ca_file" { | ||
description = "A local file path to a PEM-encoded certificate authority used to verify the remote agent's certificate" | ||
type = string | ||
} | ||
|
||
variable "cert_file" { | ||
description = "A local file path to a PEM-encoded certificate provided to the remote agent. If this is specified, key_file or key_pem is also required" | ||
type = string | ||
} | ||
|
||
variable "key_file" { | ||
description = "A local file path to a PEM-encoded private key. This is required if cert_file or cert_pem is specified." | ||
type = string | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to get rid of these now that we have that top-level tls
object variable? (Which I like better for sure 👍 )
Description
This PR introduces a module to upgrade the servers one by one. It also adds it to the scenario.
Testing & Reproduction steps
Links
Contributor Checklist
changelog entry using the
make cl
command.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.