Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module to upgrade servers #24971

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open

Module to upgrade servers #24971

wants to merge 55 commits into from

Conversation

Juanadelacuesta
Copy link
Member

@Juanadelacuesta Juanadelacuesta commented Jan 28, 2025

Description

This PR introduces a module to upgrade the servers one by one. It also adds it to the scenario.

Testing & Reproduction steps

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

Comment on lines 161 to 162
Important: The path where the binary will be placed is hardcoded, according
to Nomads best practices it will be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably less "best practices" and more "what our packaging does".

enos/modules/upgrade_instance/variables.tf Outdated Show resolved Hide resolved
enos/modules/upgrade_instance/variables.tf Outdated Show resolved Hide resolved
enos/modules/upgrade_instance/variables.tf Outdated Show resolved Hide resolved
Comment on lines 45 to 47
inline = [
"nomad operator snapshot save ${random_pet.upgrade.id}-0.snap",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the -stale flag, this is always getting forwarded to the leader.

Copy link
Member Author

@Juanadelacuesta Juanadelacuesta Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me test with the stale flag and specifying the server address

gulducat
gulducat previously approved these changes Feb 5, 2025
Copy link
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

just a few comments for your consideration, but no blockers.

}

inline = [
"sudo systemctl daemon-reload",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't hurt anything, but I don't believe daemon-reload is required unless a systemd unit file is being changed. here only the thing changing is the binary, right? so only need to restart the agent.

Comment on lines +31 to +33
// Taking a snapshot forces the cluster to store a new snapshot that will be
// used to restore the cluster after the restart, because it will be the most
// recent available, the resulting file wont be used..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the purpose of the snapshot is just to compact the state store, so that no log replay is necessary when the agent starts up?

Copy link
Member

@tgross tgross Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just "no log replay necessary" but that we force the server not to use the logs, which don't pass thru the FSM restore code path in the same way.

Comment on lines +82 to +84
////////////////////////////////////////////////////////////////////////////////
// Upgrading the second server
////////////////////////////////////////////////////////////////////////////////
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure would be neat if TF gave us a way to serialize resource creation, alas. hashicorp/terraform#10802

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declarative language..blah..blah...

Comment on lines 109 to 117
nomad_addr = var.nomad_addr
ca_file = var.ca_file
cert_file = var.cert_file
key_file = var.key_file
nomad_token = var.nomad_token
platform = var.platform
server_address = var.servers[1]
nomad_local_upgrade_binary = var.nomad_upgraded_binary
ssh_key_path = var.ssh_key_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-picky: lots of redundant code around here, which is pretty normal for terraform, but...

I get why these have to be separate module calls, but the only thing that's changing here is the server_address, right?

what if all the other variables were a single object, so that a map of them all could be stored in a local? something like this in upgrade_instance/variables.tf:

variable "server_address" {
  description = "IP address of the server that will be updated"
  type        = string
}

variable "stuff" {
  description = "the rest of the stuff"
  type = object({
    # this object definition ensures all the required stuff is passed in
    nomad_addr                 = string
    ca_file                    = string
    cert_file                  = string
    key_file                   = string
    nomad_token                = string
    platform                   = string
    nomad_local_upgrade_binary = string
    ssh_key_path               = string
  })
}

and here in upgrade_servers/main.tf:

locals {
  stuff = {
    nomad_addr                 = var.nomad_addr
    ca_file                    = var.ca_file
    cert_file                  = var.cert_file
    key_file                   = var.key_file
    nomad_token                = var.nomad_token
    platform                   = var.platform
    nomad_local_upgrade_binary = var.nomad_upgraded_binary
    ssh_key_path               = var.ssh_key_path
  }
}

module upgrade_second_server {
  depends_on = [enos_local_exec.take_second_cluster_snapshot]

  source = "../upgrade_instance"

  server_address = var.servers[1]
  stuff          = local.stuff
}

I really like this pattern (sometimes, especially with sub-modules like this) -- like passing in a reusable config struct in golang, instead of a bunch of separate func parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did my best to group things that made sense :)

Comment on lines 125 to 131
environment = {
NOMAD_ADDR = var.nomad_addr
NOMAD_CACERT = var.ca_file
NOMAD_CLIENT_CERT = var.cert_file
NOMAD_CLIENT_KEY = var.key_file
NOMAD_TOKEN = var.nomad_token
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these calls could also be shortened with a local env map:

locals {
  nomad_env = {
    NOMAD_ADDR        = var.nomad_addr
    NOMAD_CACERT      = var.ca_file
    NOMAD_CLIENT_CERT = var.cert_file
    NOMAD_CLIENT_KEY  = var.key_file
    NOMAD_TOKEN       = var.nomad_token
  }
}

resource "enos_local_exec" "second_leader_verification" {
  depends_on = [module.upgrade_second_server]

  environment = local.nomad_env

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im new to terraform, I will take a look :) Thank you!

tgross
tgross previously approved these changes Feb 5, 2025
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Comment on lines +52 to +66
/* variable "ca_file" {
description = "A local file path to a PEM-encoded certificate authority used to verify the remote agent's certificate"
type = string
}

variable "cert_file" {
description = "A local file path to a PEM-encoded certificate provided to the remote agent. If this is specified, key_file or key_pem is also required"
type = string
}

variable "key_file" {
description = "A local file path to a PEM-encoded private key. This is required if cert_file or cert_pem is specified."
type = string
}
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to get rid of these now that we have that top-level tls object variable? (Which I like better for sure 👍 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants