Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeleteBackupRequest can trigger kopia cleanup #8365

Open
kaovilai opened this issue Oct 30, 2024 · 10 comments
Open

DeleteBackupRequest can trigger kopia cleanup #8365

kaovilai opened this issue Oct 30, 2024 · 10 comments
Assignees
Labels
Icebox We see the value, but it is not slated for the next couple releases. kind/requirement

Comments

@kaovilai
Copy link
Member

Describe the problem/challenge you have

We want a field in DeletebackupRequest to force a kopia maintenance during backup deletion process in order to more immediately affect change to the storage usage.

Describe the solution you'd like

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@kaovilai kaovilai added this to OADP Oct 30, 2024
@kaovilai
Copy link
Member Author

You can assign me this too.

@sseago
Copy link
Collaborator

sseago commented Oct 30, 2024

I think first we want to have a discussion around whether triggering this directly is desirable. That could result in too many maint jobs running too close together. It might be sufficient to just configure making all maintenance jobs full and running them every hour or two. Even if we do trigger an immediate full maintenance, blob cleanup won't happen until the next full maint job that happens at least 4 hours later, so an hourly full maint job would result in cleanup within 4 or 5 hours. Adding an addiitonal immediate maint job upon deletion might make it happen half an hour or so sooner, which isn't really much.

@sseago sseago assigned sseago and kaovilai and unassigned sseago Oct 30, 2024
@Lyndon-Li
Copy link
Contributor

I guess here we want to delete the repo data immediately after the backup is deleted. But that is not how Kopia works:

  • There are many margins(for sub tasks of maintenance) in Kopia repo around the deletion of data, indexes, manifests, etc., even though we run the maintenance immediately after backup deletion, most probably, the data is not deleted immediately.
  • Keeping the data in a reasonable time is a foundation of Kopia to assure the system success to work. Kopia repo are all client side operations, there is not enough resources to assure the effective deletion of data, so keeping the data some time is a trade off.
  • In terms of design of repo storage, the data IO and the GC are two separate sub systems, coupling them is not a correct design direction no matter what issue we want to solve.

@kaovilai
Copy link
Member Author

Here are additional notes @weshayutin and @sseago did on maintenance and scenarios where it might be desirable to do a one-time full maintenance.

it should be limited to exceptional circumstances – something like "We accidentally backed up a 10TB volume we didn't intend to back up, so we deleted the Velero backup and need to immediately get rid of this in our bucket." Or also "We have found a bug in Velero and full maintenance is not working properly. The data should have been removed days ago but has not."

I'll ask with team for more contexts why this is needed.

@sseago
Copy link
Collaborator

sseago commented Oct 31, 2024

Velero won't be able to delete this immediately -- as @Lyndon-Li mentioned, kopia includes safety mechanisms to prevent data that still might be needed from being deleted prematurely, so even if we immediately ran full maint on backup deletion, we'd still have to wait at least 4 hours for a subsequent full maint to actually delete the data. I think the goal here was to deal with the fact that with only once-daily full maintenance, it could take up to 48 hours for garbage collection to complete. However, I'm not convinced that we want to run full maint as part of backup deletion -- I think we can accomplish the same goal by finding a way to run full maintenance more often than every 24 hours. There's a balance here between running too often (leading to cluster performance problems) and not often enough (leading to higher data/storage costs). For most users, 24 hour full maint cycle is good enough. For certain users who are more concerned with storage costs than cluster load, something like 12 or 6 hours is probably better. Due to the safety mechanisms built into kopia, though -- running full maintenance more frequently than every 4 hours is a waste of resources since that's the minimum time between first and last GC cycle that marks a blob to delete before kopia will delete it.

@reasonerjt
Copy link
Contributor

Will we still wanna do this if we choose to shorten the time window for repo maintenance as suggested in #8364 ?

@kaovilai
Copy link
Member Author

kaovilai commented Nov 5, 2024

I believe it is still an open need, but is not urgent or ever need a solution.

There are remaining enhancements outside of triggering immediate cleanup, such as
if bsl has no backups, and kopia repo has no snapshots, velero can delete <bucket>/<prefix>/<kopia> folder

@kaovilai
Copy link
Member Author

kaovilai commented Nov 5, 2024

Could close as unplanned but continue discussion if it ever become practical to do so.

@kaovilai
Copy link
Member Author

kaovilai commented Nov 5, 2024

Additionally we found if we use kopia cli with --safety=none we can immediately affect kopia repo as long as it is known that there are no running velero backups.

--safety=none could be documented for user as a workaround but not implemented in velero code. If agreed, we can open a documentation issue for that.

@reasonerjt
Copy link
Contributor

--safety=none could be documented for user as a workaround but not implemented in velero code. If agreed, we can open a documentation issue for that.

Please go ahead and open a new issue.
I'm putting this in "icebox" rather than closing it as we discussed in the community mtg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Icebox We see the value, but it is not slated for the next couple releases. kind/requirement
Projects
None yet
Development

No branches or pull requests

4 participants