-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry should be enabled by default #193
Comments
FWIW, provider-helm indeed retries for other errors than a failing release. For example, it would retry forever if it hits an error while pulling the chart or communicating with the Kube API. I would be supportive of implementing a negative value meaning Also related: https://fluxcd.io/flux/components/helm/helmreleases/#configuring-failure-remediation |
I encountered this problem again today, so I'd really like to find a way to make it retry-by-default. How about a CLI parameter to the provider like |
I think there's a mismatch of expectations, Flux has I found this issue reaching for a way to But I think there could be other cases where you want the release to retry when it's in a failed state, without uninstalling. I think Helm has a way to retry a failed upgrade, but a failed install needs to be uninstalled before it can try again. Even knowing all that I know I was still surprised to see Crossplane didn't retry on failure, that's definitely what GitOps users expect, but following the flux model it should probably not have indefinite retries because again, Helm is expensive, and retrying forever could easily have swamped the node that helm-controller is running on. Or in case of many failing helm releases, even deadlock the control loop that only has so many goroutines to manage concurrent Helm releases before they start to queue up behind one another. (I don't know if that's exactly how the Crossplane provider helm works, I assume it works how Flux's Helm Controller does, but with the Helm CLI instead of the Helm SDK underneath.) |
What happened?
One of Crossplane's best features is that it just keeps retrying when things fail, ever optimistic that something will change in the future to allow things to succeed "next time".
provider-helm is an exception to this - by default it does NOT retry, and there is no way to make it retry forever.
rollbackLimit
defaults tonil
, which disables retry. And there is no way to setrollbackLimit
to "forever", such as using0
or-1
. The only option is to set it to a "large" number which will eventually be exceeded.Ideally the
Release
would behave like all other Managed Resources and continuously retry until it is successful.It seems like to be consistent with other providers and Crossplane itself,
rollbackLimit
should default to a value that specifies infinite retries, ornil
should indicate infinite retries and0
should disable retries.rollbackLimit
is also not an obvious attribute name for an option that controls retries. Maybe a better solution would be to add aretryEnabled
attribute that turns retries on and off, and leaverollbackLimit
to specify the number of retries, wherenil
is no limit.I'd be glad to push a PR if any of these suggestions are agreeable.
How can we reproduce it?
Deploy a Release
The text was updated successfully, but these errors were encountered: