Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overview of preparing for DR #3558

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

asteflova
Copy link
Member

What changes are you introducing?

I'm adding a list of recommended disaster recovery plans. Each DR plan description follows the same template to make it easier to compare them and help users decide which one to choose.

Why are you introducing these changes? (Explanation, links to references, issues, etc.)

N/A

Anything else to add? (Considerations, potential downsides, alternative solutions you have explored, etc.)

As a next step, I'd like to provide more detailed instructions for each of the DR methods. Ideally with three procedures for each: 1. Steps to prepare, 2. Steps to test, 3. Steps to recover.

Checklists

  • I am okay with my commits getting squashed when you merge this PR.
  • I am familiar with the contributing guidelines.

Please cherry-pick my commits into:

  • Foreman 3.13/Katello 4.15 (EL9 only)
  • Foreman 3.12/Katello 4.14 (Satellite 6.16)
  • Foreman 3.11/Katello 4.13 (orcharhino 6.11 on EL8 only; orcharhino 7.0 on EL8+EL9)
  • Foreman 3.10/Katello 4.12
  • Foreman 3.9/Katello 4.11 (Satellite 6.15; orcharhino 6.8/6.9/6.10)
  • Foreman 3.8/Katello 4.10
  • Foreman 3.7/Katello 4.9 (Satellite 6.14)
  • We do not accept PRs for Foreman older than 3.7.

Copy link

github-actions bot commented Jan 9, 2025

@asteflova
Copy link
Member Author

This is just a first draft so things might still change a lot. But I wanted to open it up for early feedback in case anyone notices anything particularly concerning.

@asteflova asteflova force-pushed the dr-overview branch 3 times, most recently from da0c462 to 70f3e50 Compare January 13, 2025 10:57
@asteflova
Copy link
Member Author

Hi @ehelms, can you please review? This is just a prettified version of the draft you shared but I did fill in a few of my own assumptions that need checking. If this gets merged, I would like to follow up with more detailed instructions for each of the scenarios, see the PR's description.

@asteflova asteflova requested a review from ehelms January 13, 2025 11:01
@asteflova asteflova added Needs tech review Requires a review from the technical perspective Needs style review Requires a review from docs style/grammar perspective Not yet reviewed and removed Not yet reviewed labels Jan 13, 2025
@asteflova asteflova marked this pull request as ready for review January 13, 2025 11:03
Copy link
Contributor

@maximiliankolb maximiliankolb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor suggestions, rest LGTM style-wise.

the overall style is quite different to our documentation but I like it!

@asteflova asteflova added style review done No issues from docs style/grammar perspective and removed Needs style review Requires a review from docs style/grammar perspective labels Jan 14, 2025
@asteflova
Copy link
Member Author

Thanks for the comments @ehelms, I applied your feedback. Can you please re-review and tell me if you have any more?

@ehelms
Copy link
Member

ehelms commented Jan 14, 2025

I think it's looking good, as the original author of the content, I'd like to get a review from @evgeni.
One area of information I think would be useful is a bit of a pros/cons of each option. From conversations I've had with users, the two key areas of discussion have been what are the methods and what are the trade-offs of each method in order to help them choose.

@asteflova
Copy link
Member Author

One area of information I think would be useful is a bit of a pros/cons of each option. From conversations I've had with users, the two key areas of discussion have been what are the methods and what are the trade-offs of each method in order to help them choose.

It would indeed be great if we could add pros/cons. "What is the expected impact?" was my take on the cons. One of my earliest drafts also included "When should I choose this option?", which would correspond to pros, but I didn't know what to fill in.

@asteflova asteflova requested a review from evgeni January 14, 2025 18:47
@jtruestedt
Copy link

Maybe to add some notes from my experience with different customers:

  • If you separate at least your pulp-content (/var/lib/pulp) to be on external storage and your backupmechanism is not fully in sync, it is important that the backup of your database is not newer than the pulp-content (if there are packages missing, they usually cannot be recovered, while new packages already synced to the pulp content but lost in the database are no issue).

  • In the active passive setup with restore you do not necessarily need to have an existing passive setup but you can create that - in this case it depends how quickly you need to recover. But the restore works as long as you have the same Foreman version and this can be a blank one

  • In the active active scenario you, you can also work with content import/export - this has also the advantage that you do not need a direct network connection between both instances

  • You can also have a scenario between the two described active - passive instances - you can have the pulp content on network storage (with backups) and create backups using foreman-maintain without pulp-content on your main instance (this is mainly a database dump and needs usually less then 10 min) in case of restoring you just need to make sure that /var/lib/pulp is mounted before you restore and then the whole backup and restore mechanism is much faster as the pulp-content does not need to be extracted

  • And the most important one: if you plan this kind of disaster recovery also make sure you test it, that you know what to do if you need it and that you have everything included in your backup - but maybe this should be obvious even if experience shows something different ;)

Because I finally installed a spell checker.
Copy link
Member Author

@asteflova asteflova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented part of the feedback, some of it only as comments.

As stated in the PR's description, I intend to follow up on this PR with others that will provide actual DR procedures with actionable steps for users to take. The lines that are commented out seem like a better fit for such procedures so I want to add them only as internal notes for now.

@asteflova
Copy link
Member Author

Maybe to add some notes from my experience with different customers:

This is great feedback, thank you!

* If you separate at least your pulp-content (/var/lib/pulp) to be on external storage and your backupmechanism is not fully in sync, it is important that the backup of your database is not newer than the pulp-content (if there are packages missing, they usually cannot be recovered, while new packages already synced to the pulp content but lost in the database are no issue).

* In the active passive setup with restore you do not necessarily need to have an existing passive setup but you can create that - in this case it depends how quickly you need to recover. But the restore works as long as you have the same Foreman version and this can be a blank one

* In the active active scenario you, you can also work with content import/export - this has also the advantage that you do not need a direct network connection between both instances

* You can also have a scenario between the two described active - passive instances - you can have the pulp content on network storage (with backups) and create backups using foreman-maintain without pulp-content on your main instance (this is mainly a database dump and needs usually less then 10 min) in case of restoring you just need to make sure that /var/lib/pulp is mounted before you restore and then the whole backup and restore mechanism is much faster as the pulp-content does not need to be extracted

I think all of this will be most useful to document in a subsequent PR where I intend to describe the actual steps/procedures users need to take when implementing their preferred DR scenario. Perhaps as best practices/recommendations? So I won't work this in just yet and will wait until I start working on the procedures.

* And the most important one: if you plan this kind of disaster recovery also make sure you test it, that you know what to do if you need it and that you have everything included in your backup - but maybe this should be obvious even if experience shows something different ;)

Yes!! :) I would very much like to include some testing steps -- not in this PR but in those subsequent PRs I mentioned (see also this PR's description for hints on what I'm planning next). Some sort of verification procedure or a checklist that will help users verify that they are well prepared is certainly a good idea.

@asteflova
Copy link
Member Author

I resolved all comments and hopefully implemented all your feedback @ehelms and @evgeni. Can you please re-review?

Note that some lines are now commented out. I intend to use them in follow-up PRs that will introduce DR procedures. I'd like to keep this PR limited strictly to an overview (= brief descriptions and pros/cons).

I also added sections for Advantages of each DR scenario. They are currently empty because I don't know what these pros/advantages are. Can you please help with that?

Copy link
Member

@ehelms ehelms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have the one comment around virtualization, I think this looks like a good step and intro to disaster recovery especially since I know the plan is the follow up with more details for each scenario.

@asteflova asteflova added tech review done No issues from the technical perspective and removed Needs tech review Requires a review from the technical perspective labels Jan 20, 2025
@asteflova
Copy link
Member Author

I removed the "Advantages" placeholders because we currently don't have any information to provide there.

@asteflova
Copy link
Member Author

Hi @maximiliankolb, you gave this an ack a while ago but some things have changed since then. Do you want to re-review?

//The IP address can change.
//====

Virtualizing your {ProjectServer}::
Copy link
Contributor

@Lennonka Lennonka Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Virtualizing your {ProjectServer}::
.Virtualizing your {ProjectServer}

I would personally prefer informal headings for the plan names but it isn't blocking your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
style review done No issues from docs style/grammar perspective tech review done No issues from the technical perspective
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants