Skip to content
This repository has been archived by the owner on Jun 24, 2020. It is now read-only.

Serving operator interferes with serving updates to MutatingWebhookConfiguration #226

Closed
garron opened this issue Nov 18, 2019 · 13 comments · Fixed by #243
Closed

Serving operator interferes with serving updates to MutatingWebhookConfiguration #226

garron opened this issue Nov 18, 2019 · 13 comments · Fixed by #243

Comments

@garron
Copy link
Contributor

garron commented Nov 18, 2019

When using the 0.10 serving operator, I've run into issues where after bringing up serving via the CR and trying to deploy a ksvc, the ksvc never comes up. The Revision stays at ready status=Unknown, reason=Deploying.

It appears that the reason that the revision isn't coming up is because the webhook isn't configured properly. After coming up, serving (the webhook controller, I believe) writes necessary configuration to the MutatingWebhookConfiguration webhook.serving.knative.dev in the top level webhooks key. Shortly after writing the configuration, the operator reconcile loop re-applies it's own configuration for this object, wiping out the configuration needed by the webhook controller. When watching the MutatingWebhookConfiguration, I do see it eventually get fixed up after some period of time, but my revision still doesn't come up automatically. However, once the MutatingWebhookConfiguration gets into the correct state, I can deploy services successfully.

I did verify that if I update the operator to not touch existing resources (create them only), that I am able to bring up a ksvc as expected. This obviously ins't a long-term solution, but does seem to indicate we may need to be more selective about what or how the operator updates serving resources.

This relates to #163

@jcrossley3
Copy link
Contributor

FWIW, I agree with you. The logic that determines whether a resource needs to be updated needs to be improved. At the time, I didn't even know there was a Patch command. ;)

@garron
Copy link
Contributor Author

garron commented Nov 23, 2019

It looks like the reason this didn't show up until 0.10 is that for 0.9, the MutatingWebhookConfiguration wasn't part of the release yaml manifest that the operator applies, but rather created by the webhook controller. It was added here: knative/serving#5871.

I happen to like that the operator enforces the configuration is in the state it expects (so long as it doesn't break things). However, if this pattern of serving mutating objects that the operator creates is common or something that we expect to happen more in the future, I might have to re-evaluate my viewpoint. If it's not a pattern that we expect to continue in serving, I'd prefer to treat these cases (e.g. serving updating MutatingWebhookConfiguation) as special cases that the operator copes with.

@mattmoor Do you have any thoughts on how the operator should treat the MutatingWebhookConfiguration? Do you happen to know if this behavior where serving mutates the resources that were originally created by the serving release yaml happens elsewhere?

@garron
Copy link
Contributor Author

garron commented Nov 23, 2019

On some level, I'm wondering if this is a bug in the operator or a bug in serving. It seems like a human running "kubectl apply" with the serving yaml multiple time might also have the ability to break the MutatingWebhookConfiguration, although I haven't tested that. Or maybe it's just that kubectl does more work to merge things.

@mattmoor
Copy link
Member

mattmoor commented Nov 23, 2019 via email

@garron
Copy link
Contributor Author

garron commented Nov 25, 2019

To follow up on my last comment: I was not able to reproduce this using "kubectl apply" repeatedly. It appears to not attempt any changes to the webhook configuration in subsequent invocations of apply. Even if I manually tweak the MutatingWebhookConfiguration slightly to force kubectl to make changes, it undoes my changes using a patch and leaves the configuration that was put in by the webhook controller alone.

@houshengbo
Copy link

@garron If the three-way merge gets mature in 1.16 of Kubernetes, will this issue be fixed automatically if we upgrade the Kube to the 1.16 version?
Or we still need to implement some patching mechanism with lastOperatorAppliedValue for the objects?

@markusthoemmes
Copy link
Contributor

As noted in the discussion in Slack: I'm leaning towards hot-fixing this at least for the WebhookConfiguration by configuring Manifestival to ignore a set list of types and use "Ensure exists" semantics for those types. We can potentially even extend that to GroupVersionKind + NamespacedName so we could ignore the colliding resources for now.

It sure is a workaround but feels enough to me to get us to 1.16 until we have a better way of fixing it?

@houshengbo
Copy link

houshengbo commented Nov 26, 2019

Regarding "configuring Manifestival to ignore a set list of types", do we leave it an open option to users, or use the fixed rules implemented by Manifestival for webhook only?

@markusthoemmes
Copy link
Contributor

@houshengbo It'd be down to the client of the library, so our operator can be very specific about when to apply the other strategy.

@duglin
Copy link

duglin commented Nov 26, 2019

However, if this pattern of serving mutating objects that the operator creates is common or something that we expect to happen more in the future, I might have to re-evaluate my viewpoint.

Yep - whether it's knServing itself making the modifications or an admin of KnServing I think it's the same high-level issue... where is the source of truth? If sometimes it's in Serving and sometimes it's in the operator then I think this issue will continue to pop up.

@markusthoemmes
Copy link
Contributor

I'd say the source is the operator's CRD. Knative munging it's own resources could be seen as a bug (I think it is) and we only chose this route for upgradeability reasons. We should eradicate all instances of the Knative controllers munging stuff in the release.yaml IMO.

@garron
Copy link
Contributor Author

garron commented Nov 26, 2019

@markusthoemmes If eradicating all instances of the Knative controllers munging stuff in the release.yamls is technically feasible, it seems like a great direction to go from the perspective of the operator. It keeps the ownership model clear and makes the behavior of the operator much easier to reason about (and hopefully document).

The doc that I put together around this basically assumes that the behavior of serving mutating things is necessary. However, I'd be happy to have that not be the case.

The workaround that you're suggesting around Ensure Exists semantics is something that I've tried out and works. However, in trying it out, I added serving-specific concepts to manifestival (specific object types to ignore and the release label name), which doesn't seem appropriate to land. Landing it would likely require abstracting the concepts and allowing hooks of some kind into manifestival.

@markusthoemmes
Copy link
Contributor

@garron right, but I feel like a configuration for Manifestival that allows that and then encoding the serving specific bits in the operator is the quickest way forward to unblock 0.10.

I think the operations WG should push for such a state in Knative Serving. Do we have a list of things were Serving currently munges with stuff from its own manifest?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants