-
Notifications
You must be signed in to change notification settings - Fork 45
Serving operator interferes with serving updates to MutatingWebhookConfiguration #226
Comments
FWIW, I agree with you. The logic that determines whether a resource needs to be updated needs to be improved. At the time, I didn't even know there was a |
It looks like the reason this didn't show up until 0.10 is that for 0.9, the MutatingWebhookConfiguration wasn't part of the release yaml manifest that the operator applies, but rather created by the webhook controller. It was added here: knative/serving#5871. I happen to like that the operator enforces the configuration is in the state it expects (so long as it doesn't break things). However, if this pattern of serving mutating objects that the operator creates is common or something that we expect to happen more in the future, I might have to re-evaluate my viewpoint. If it's not a pattern that we expect to continue in serving, I'd prefer to treat these cases (e.g. serving updating MutatingWebhookConfiguation) as special cases that the operator copes with. @mattmoor Do you have any thoughts on how the operator should treat the MutatingWebhookConfiguration? Do you happen to know if this behavior where serving mutates the resources that were originally created by the serving release yaml happens elsewhere? |
On some level, I'm wondering if this is a bug in the operator or a bug in serving. It seems like a human running "kubectl apply" with the serving yaml multiple time might also have the ability to break the MutatingWebhookConfiguration, although I haven't tested that. Or maybe it's just that kubectl does more work to merge things. |
It happens to at least the secret as well. I believe it will also happen with gateway resources when auto-TLS is enabled.
If the semantics aren’t three-way merge then folks are going to have loads of fun migrating to the Operator 😬
…-M(obile)
On Nov 22, 2019, at 4:55 PM, Garron Moore ***@***.***> wrote:
It looks like the reason this didn't show up until 0.10 is that for 0.9, the MutatingWebhookConfiguration wasn't part of the release yaml manifest that the operator applies, but rather created by the webhook controller. It was added here: knative/serving#5871.
I happen to like that the operator enforces the configuration is in the state it expects (so long as it doesn't break things). However, if this pattern of serving mutating objects that the operator creates is common or something that we expect to happen more in the future, I might have to re-evaluate my viewpoint. If it's not a pattern that we expect to continue in serving, I'd prefer to treat these cases (e.g. serving updating MutatingWebhookConfiguation) as special cases that the operator copes with.
@mattmoor Do you have any thoughts on how the operator should treat the MutatingWebhookConfiguration? Do you happen to know if this behavior where serving mutates the resources that were originally created by the serving release yaml happens elsewhere?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
To follow up on my last comment: I was not able to reproduce this using "kubectl apply" repeatedly. It appears to not attempt any changes to the webhook configuration in subsequent invocations of apply. Even if I manually tweak the MutatingWebhookConfiguration slightly to force kubectl to make changes, it undoes my changes using a patch and leaves the configuration that was put in by the webhook controller alone. |
@garron If the three-way merge gets mature in 1.16 of Kubernetes, will this issue be fixed automatically if we upgrade the Kube to the 1.16 version? |
As noted in the discussion in Slack: I'm leaning towards hot-fixing this at least for the WebhookConfiguration by configuring Manifestival to ignore a set list of types and use "Ensure exists" semantics for those types. We can potentially even extend that to GroupVersionKind + NamespacedName so we could ignore the colliding resources for now. It sure is a workaround but feels enough to me to get us to 1.16 until we have a better way of fixing it? |
Regarding "configuring Manifestival to ignore a set list of types", do we leave it an open option to users, or use the fixed rules implemented by Manifestival for webhook only? |
@houshengbo It'd be down to the client of the library, so our operator can be very specific about when to apply the other strategy. |
Yep - whether it's knServing itself making the modifications or an admin of KnServing I think it's the same high-level issue... where is the source of truth? If sometimes it's in Serving and sometimes it's in the operator then I think this issue will continue to pop up. |
I'd say the source is the operator's CRD. Knative munging it's own resources could be seen as a bug (I think it is) and we only chose this route for upgradeability reasons. We should eradicate all instances of the Knative controllers munging stuff in the release.yaml IMO. |
@markusthoemmes If eradicating all instances of the Knative controllers munging stuff in the release.yamls is technically feasible, it seems like a great direction to go from the perspective of the operator. It keeps the ownership model clear and makes the behavior of the operator much easier to reason about (and hopefully document). The doc that I put together around this basically assumes that the behavior of serving mutating things is necessary. However, I'd be happy to have that not be the case. The workaround that you're suggesting around Ensure Exists semantics is something that I've tried out and works. However, in trying it out, I added serving-specific concepts to manifestival (specific object types to ignore and the release label name), which doesn't seem appropriate to land. Landing it would likely require abstracting the concepts and allowing hooks of some kind into manifestival. |
@garron right, but I feel like a configuration for Manifestival that allows that and then encoding the serving specific bits in the operator is the quickest way forward to unblock 0.10. I think the operations WG should push for such a state in Knative Serving. Do we have a list of things were Serving currently munges with stuff from its own manifest? |
When using the 0.10 serving operator, I've run into issues where after bringing up serving via the CR and trying to deploy a ksvc, the ksvc never comes up. The
Revision
stays at ready status=Unknown, reason=Deploying.It appears that the reason that the revision isn't coming up is because the webhook isn't configured properly. After coming up, serving (the webhook controller, I believe) writes necessary configuration to the
MutatingWebhookConfiguration
webhook.serving.knative.dev
in the top levelwebhooks
key. Shortly after writing the configuration, the operator reconcile loop re-applies it's own configuration for this object, wiping out the configuration needed by the webhook controller. When watching theMutatingWebhookConfiguration
, I do see it eventually get fixed up after some period of time, but my revision still doesn't come up automatically. However, once theMutatingWebhookConfiguration
gets into the correct state, I can deploy services successfully.I did verify that if I update the operator to not touch existing resources (create them only), that I am able to bring up a ksvc as expected. This obviously ins't a long-term solution, but does seem to indicate we may need to be more selective about what or how the operator updates serving resources.
This relates to #163
The text was updated successfully, but these errors were encountered: