Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PE-39352) Update backup restore plans for hac database #532

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jpartlow
Copy link
Contributor

@jpartlow jpartlow commented Dec 12, 2024

Summary

PE 2023.7.0 added the host-action-collector service and the
backup/restore plans need to be made aware of the pe-hac database.

Likewise 2025.0.0 adds the patching service and the plans need a similar update for the pe-patching database.

This patch adds a version test, best on the addition of pe_version check
of /opt/puppetlabs/server/pe_build in the peadm::get_peadm_config task.

If >=2025.0, then pe-hac and pe-patching is added to the defaults returned by the
various recover_opts functions that define which resource backup/restore
should wrangle. If >=2023.7.0, but < 2025.0 then just pe-hac is added.

The patch does not attempt to version test the user's custom override
hash of selections, on the assumption that they know, or should be
allowed to specify whatever is needed.

If pe_version is not found (damaged cluster, missing from
peadm_config.json), the patch defaults to assuming earliest version,
which will skip pe-hac/patching. This favors the plans running without error at
the cost of possibly missing pe-hac/patching in a newer installation.
Alternately, we could reverse this and incur failures in damaged older
clusters requiring use of $custom to work around.

Finally there is an edge case it does not address which is the case of
newer pe versions with pe-hac restoring tarballs that were intended to
include all databases, but which were created with peadm predating this
patch. These will fail the pg-restore pe-hac commands, since pe-hac
won't be in the backup, and again require use of $custom to work around.

Checklist

  • 🟢 Spec tests.
  • 🟢 Acceptance tests.

Changes include test coverage?

  • Yes
  • Not needed

Have you updated the documentation?

  • Yes, I've updated the appropriate docs
  • Not needed

@jpartlow
Copy link
Contributor Author

jpartlow commented Dec 12, 2024

That's odd, specs are passing for me locally. Oh, this is a bolt 4 issue; looks like that's in flux. The rebase pulling in the bolt 4 specs fix sorted that out.

PE 2023.7.0 added the host-action-collector service and the
backup/restore plans need to be made aware of the pe-hac database.

This patch adds a version test, best on the addition of pe_version check
of /opt/puppetlabs/server/pe_build in the peadm::get_peadm_config task.

If >= 2023.7, then pe-hac is added to the defaults returned by the
various recover_opts functions that define which resource backup/restore
should wrangle.

The patch does not attempt to version test the user's custom override
hash of selections, on the assumption that they know, or should be
allowed to specify whatever is needed.

If pe_version is not found (damaged cluster, missing from
peadm_config.json), the patch defaults to assuming earliest version,
which will skip pe-hac. This favors the plans running without error at
the cost of possibly missing pe-hac in a newer installation.
Alternately, we could reverse this and incur failures in damaged older
clusters requiring use of $custom to work around.

Finally there is an edge case it does not address which is the case of
newer pe versions with pe-hac restoring tarballs that were intended to
include all databases, but which were created with peadm predating this
patch. These will fail the pg-restore pe-hac commands, since pe-hac
won't be in the backup, and again require use of $custom to work around.
@jpartlow jpartlow force-pushed the pe-39352-update-backup-restore-plans-for-hac-database branch from ca0bedb to 94a2ac6 Compare December 12, 2024 18:26
@jpartlow jpartlow marked this pull request as ready for review December 12, 2024 21:13
@jpartlow jpartlow requested review from a team as code owners December 12, 2024 21:13
2025.0 adds the pe-patching-service and pe-patching database. This
commit updates the recovery metadata to account for the pe-patching
database in 2025.0+ versions.
Also updates the peadm::recovery_opts type doc.
@jpartlow jpartlow force-pushed the pe-39352-update-backup-restore-plans-for-hac-database branch from f9cf46d to 93adb91 Compare December 12, 2024 21:20
@jpartlow
Copy link
Contributor Author

  • Needs an update to docs/backup_restore.md
  • Needs manual testing.

@jhbuchanan45
Copy link
Contributor

If pe_version is not found (damaged cluster, missing from
peadm_config.json), the patch defaults to assuming earliest version,
which will skip pe-hac/patching.

This might be me misunderstanding the failure case, but if this could result in databases potentially not getting backed up and comes from a damaged install, it seems like we would definitely want an error

@jpartlow
Copy link
Contributor Author

jpartlow commented Dec 13, 2024

If pe_version is not found (damaged cluster, missing from
peadm_config.json), the patch defaults to assuming earliest version,
which will skip pe-hac/patching.

This might be me misunderstanding the failure case, but if this could result in databases potentially not getting backed up and comes from a damaged install, it seems like we would definitely want an error

Yes, the error cases can be tricky, but I think I had a more fundamental misunderstanding of what the plan is doing with the extra pg_dump's in addition to the pg_dump's taken by puppet-backup :config, so I'm going to put a do not merge label on this for now, and seek some clarity in slack.

Or I would if there was a do not merge label :) I'll turn it back into a draft for now instead.

@jpartlow jpartlow marked this pull request as draft December 13, 2024 20:44
@jpartlow jpartlow marked this pull request as ready for review January 10, 2025 19:59
@jpartlow
Copy link
Contributor Author

Alright, I think this pr is good to go as is.

The default 'recovery' backup_type uses puppet-backup --scope=certs,code,config, where the config includes the non pdb databases, and then handles pdb separately to account for the possibility of a separate postgresql pdb host. This case should always work correctly, because the puppet-backup tool functionality is based on the pe version it shipped in and will do the right thing accordingly for hac or patching if present.

The 'custom' backup_type takes duplicate backups of the non-pdb databases, for mysterious reasons, possibly related to unreleased 'migration' functionality. This does have edge cases for mismatched versions for these mysterious duplicate backups, but since the whole point of custom is that you can specify which bits to backup or restore, I don't see the cases being worth more complication to deal with in the plan.

Copy link
Contributor

@timidri timidri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hac and patching database additions for backup and restore look reasonable to me.

There is indeed a confusing situation with the separate database backups as opposed to a monolithic backup of all databases as part of the config backup option of the pe backup tool. A better solution would be to integrate the pe backup tool with PEADM and remove all duplications, but the current solution was the best work-around we found short of this major refactoring.

I don't see any changes in the integration tests - I think we need to revise the backup/restore tests to verify inclusion of patching and hac for the corresponding PE versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants