Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PE-40163) automate recovery of failed postgres server #537

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

davidmalloncares
Copy link
Contributor

Summary

This PR aims to automate the recovery of a failed Postgres server node by combining all the steps detailed here into one Bolt Plan. In order to test we need to provision an environment that has a PE server+replica, Postgres server+replica, and at least one compiler. Plan can be executed by running for example:

bolt plan run peadm::replace_failed_postgresql \
  primary_host=nasty-missing.delivery.puppetlabs.net \
  replica_host=alma-drying.delivery.puppetlabs.net \
  working_postgresql_host=frail-streak.delivery.puppetlabs.net \
  failed_postgresql_host=gooey-flatiron.delivery.puppetlabs.net \
  replacement_postgresql_host=crooked-mail.delivery.puppetlabs.net --no-host-key-check

Additional Context

Looking at tests in a separate ticket

Related Issues (if any)

Checklist

  • 🟢 Spec tests.
  • 🟢 Acceptance tests.

Changes include test coverage?

  • Yes
  • Not needed

Have you updated the documentation?

  • Yes, I've updated the appropriate docs
  • Not needed

run_command("/opt/puppetlabs/bin/puppet node purge ${$failed_postgresql_host}", $primary_host)

# Run peadm::add_database plan to deploy replacement PE-PostgreSQL server
run_plan('peadm::add_database', targets => $replacement_postgresql_host,
Copy link
Contributor Author

@davidmalloncares davidmalloncares Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ragingra there seems to be an issue with this line. If I run this plan with this line commented out and then run that line manually it all works ok. Do you see anything obvious in how I am calling it here vs how it is called manually?

bolt plan run peadm::add_database -t <replacement-postgres-server-fqdn> primary_host=<primary-server-fqdn>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ragingra I got a fix for the issue I was having and finally got all the tests to go green - could you take a wee look? :)

@davidmalloncares davidmalloncares marked this pull request as ready for review February 5, 2025 22:28
@davidmalloncares davidmalloncares requested review from a team as code owners February 5, 2025 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant